Jupyter Notebook With Python for Data Science

Decoding Data Science Secrets using Python for New Age Business Analysts

Introduction

In my previous blog, I highlighted that while Business Analysts possess skills in handling data, their role extends far beyond deep data science and statistical modelling. Unlike Data Scientists, whose expertise lies in complex algorithms and predictive analytics, Business Analysts have broader business responsibilities, making strategic recommendations based on data-driven insights. This distinction does not diminish the significance of Data Scientists, as their contributions remain crucial in shaping advanced analytical solutions.

However, Business Analyst’s job role, even without formal training in data science, still requires uncovering meaningful patterns and trends within business data. Business Analysts often work with large datasets, but not all are familiar with technical knowledge about how to harness tools like Python and Jupyter Notebook to extract information from data. As a result, for data-driven analysis, they are primarily dependent on Data Scientists. Hence, understanding this technical knowledge can greatly enhance Business Analysts in their data analysis capabilities.

As a Business Analyst, I faced similar challenges some years back. So, I decided to crack the nut of data analysis using Jupyter Notebook with Python. In the process, whatever knowledge I gained, I thought I should share it in written form for fellow Business Analysts to gain. Hence, this blog serves as a practical guide for Business Analysts new to Python and Jupyter Notebook, walking through the easy and essential steps—from installation to hands-on data analysis commands—to empower them with the skills to leverage these tools effectively.

Embracing Jupyter Notebook with Python as a tool allows analysts to work with data interactively, something like you get what you ask for, making it easier to manipulate datasets, visualise trends, and conduct exploratory analysis without the need for deep coding expertise, which streamlines their analytical processes, improves efficiency, and extracts valuable insights that drive informed decision-making. By the end of this guide, you’ll have quite a solid foundation to start leveraging Python for data-driven decision-making.

Step 1: Installing Python

Before using Jupyter Notebook, the first step is to install Python—the programming language that powers Jupyter.

Downloading Python

  1. Visit the official Python website: https://www.python.org.
  2. Navigate to the “Downloads” section and select the latest stable version.
  3. Follow the installation instructions for your operating system (Windows/macOS/Linux).
  4. During installation, ensure you check the box that says “Add Python to PATH”—this simplifies running Python from the command line.

Verifying Installation

After Python installation, you may need to restart your computer as a one-time task. After reboot, open a command prompt (Windows) or terminal (macOS/Linux) and type the following command (shown in bold letters) to check if Python is installed correctly:

Or just to check the version number of the installed Python, type the following command:

Step 2: Installing Jupyter Notebook Using pip

Python comes with pip, which stands for package installer for Python. The term pip is also a package manager command used to install various libraries, including those for Jupyter Notebook.

Installing JupyterLab vs Jupyter Notebook

Jupyter comes with two types of interfaces – JupyterLab and Jupyter Notebook. JupyterLab an IDE (Integrated Development Environment) designed to be more extensive than Jupyter Notebook. JupyterLab offers a very interactive web interface that includes notebooks, consoles, terminals, CSV editors, markdown editors, interactive maps, and more. JupyterLab is used for workflows in data science, scientific computing, computational journalism, and machine learning (ML) procedures.

Whereas the small brother, Jupyter Notebook offers a standalone and simple web interface, using which analysts can access data files and perform essential tasks like data inspection, cleaning and transformation, data visualisation through various plots, and running machine learning algorithms on data. Jupyter Notebook is a very useful tool among the scientific community for documenting and sharing the step-by-step data analysis and computational process that sequentially keeps both the written commands and codes and their outcomes or results.

Now, it’s your personal choice based on your expertise whether you want to go with JupyterLab and use the notebook provided in it along with the other advanced features, or you want to go with the simplified Jupyter Notebook. I would suggest going with Jupyter Notebook first and exploring its capabilities before moving to the advanced tool JupyterLab.

To install JupyterLab, open a terminal or command prompt and type the following command:

Or, to install Jupyter Notebook, type the following command:

These simple commands will download and install the respective Jupyter tool along with all the required dependencies.

Launching JupyterLab and Jupyter Notebook

After installation, you can launch either JupyterLab or Jupyter Notebook by running the following respective commands:

These commands will open the respective tool in your default web browser, providing an interactive environment to write and execute the Python code.

If you opened JupyterLab, select the default Python kernel in the popup box and go to the “File” menu and select “Notebook”, when the following notebook interface opens, where you can write the Python code and see the results.

Notebook in JupyterLab Interface

If you launched Jupyter Notebook, open the default “Untitled.ipynb” file and you will see the following interface, where you can write the Python code and see the results:

Jupyter Notebook Interface

Exiting JupyterLab and Jupyter Notebook

To close JupyterLab properly, go to “File” menu and select “Shut Down” and confirm in the popup box. To close Jupyter Notebook properly, go to “File” menu and select “Shut Down” and confirm in the popup box. On closing the notebooks, you can return back to the terminal or command prompt.

Back in the terminal or command prompt, if you see a prompt to update the pip, type the following command:

Step 3: Useful Jupyter Notebook Commands for Data Analysis

Jupyter Notebook provides a flexible coding environment where Business Analysts can manipulate and analyse data efficiently. Let’s go over some fundamental commands and essential steps that are useful for business data analysis:

1. Importing Libraries

Python has powerful libraries for data analysis. Use the import command to add the required libraries. A list of some essential libraries can be found in this link. Let’s import some commonly used ones:

These libraries help in handling datasets, performing numerical calculations, and visualising data.

2. Loading a Dataset

Business Analysts often work with datasets. Here’s how you can load data from a CSV file into a DataFrame using the Pandas library:

This will provide a quick preview of the dataset for analysis.

3. Basic Data Exploration

To analyse business data, it’s crucial to understand its structure. Some useful commands include:

4. Filtering and Sorting Data

You can filter data based on specific conditions:

These commands allow you to focus on high-value business data points for deeper insights.

5. Visualising Data

Data visualisation is crucial in business analysis. Let’s create a simple bar plot:

This helps in understanding sales patterns across different product categories.

Some Practical Examples

This section shows some real quick examples for reference to understand the capabilities of Python in Data Science using Jupyter Notebook.

For these hands-on examples, I downloaded a sample sales dataset in the CSV format—sales_data_sample.csv—from the Kaggle website. I reviewed the dataset using the pandas library and plot with the matplotlib library. For this analysis, I first installed these libraries and used the following codes to get the respective outputs as shown in the associated images. You may recheck these codes at your end.

Sales Dataset Viewed in Jupyter Notebook
Data Description of the Sales Dataset

I then plot the above data description in a histogram that provides insights into the distribution of sales values in our dataset. I used the seaborn library along with matplotlib to plot the nice-looking histogram with a smooth density curve. You need to install the seaborn library using pip. The significance of the histogram is as follows:

  • Understanding Data Distribution: It shows how sales values are spread across different ranges.
  • Frequency Analysis: Each bar represents how many data points fall within a specific sales range.
  • Detecting Skewness: Helps identify whether the sales data is normally distributed, right-skewed (more lower sales), or left-skewed (more higher sales).
  • Finding Outliers: If certain bars are isolated far from others, they might indicate extreme values or anomalies.
  • Decision-Making: Useful for evaluating trends, such as whether most sales are concentrated within a specific range or spread evenly.
Histogram Plot of the Sales Data

As you can see in the dataset, it has multiple SALES records for a COUNTRY. So, we need to combine all sales per country and group them in a new dataset in memory that will be used in all future analysis.

Group by Country for all Sales

Now let’s see the total sales per country in a simple bar graph plot using the matplotlib library.

Bar Plot Showing Total Sales Per Country

Now let’s plot maximum sales by country for each of the years in a bar chart.

Maximum Sales by Country Each Year

Now, let’s plot the above plot in the form of a heat map using the seaborn library.

Maximum Sales Heatmap by Country and Year

As seen in the dataset, each country sells various products identified by the Product Code. Let’s see the maximum sales for each unique Product Code in the country UK.

Maximum Sales for Each Unique Product Code in UK

Also, let’s see the total sales of a specific product with Product Code, say S18_3232, compared across different countries in a particular year, say 2004.

Total Sales for a Specific Product Code by Different Countries in a Year

That’s all in this blog. These are just a few examples of various possibilities using Python. You may try similar analysis with other datasets.

Conclusion

With Python and Jupyter Notebook, Business Analysts can efficiently handle, analyse, and visualise data without requiring deep programming knowledge. This guide provided a strong starting point, covering installation, basic operations, and essential commands for business data analysis that are essential and mostly used. As you become more familiar with Python, you can explore more advanced coding techniques to optimise your workflow and make data-driven decisions with confidence.

Leave a Comment

Your email address will not be published. Required fields are marked *