Data Visualization is a process of converting raw data to graphical representation.
It is so essential for businesses to assess the current trends and patterns. And it also helps management to make decisions faster. The data presented through color, density, size, and shapes enables us to observe the information quickly. And then, you can conclude the present scenario promptly. Data is visualized with numerous tools like scatter plots, Mekko charts, heat maps, bubble clouds, Venn diagrams, and more.
You have a set of data arranged in a data frame in Python. As you understand how data Visualization is essential, you wonder how to plot these data using the Heatmap tool in Python. Do you know what Python modules to use for creating Heatmap?
This in-depth article will initially explain what Heatmap is, its benefits, and its best practices. Then we will show you four different techniques to plot Heatmap using Python Libraries.
We assume that you have basic knowledge of Python. And Python is installed in the system.
What is Heatmap?
Heatmap is a graphical representation of data using different colors where the color represents values. Most real estate, engineering, marketing, pharmaceutical, and research sectors use Heatmap for data analysis. Heatmaps are the best tool for visualizing complex and simple information compared to charts or tables. For example, Businesses use Heatmap to visually analyze their sales, raw materials usage, and financial data.
Why should you use Heatmap?
Heatmap provides endless benefits in analyzing data for businesses and organizations.
These benefits are :
- Enhances communication: Heatmap is a more effective tool to communicate the business’s current financial or operational situation. And provide us with information for improvements to be made.
- Enhances Time Based Trend Analysis: The most extraordinary feature about Heatmap can convey timely changes using visual representation. Organizations can see improvement or decline in their sales or other data over time and in which locations. It helps companies to decide on sales and marketing efforts accordingly.
- Enhances Competitive Advantage: Heatmaps can help us to study the competitive landscape of the market. Businesses can identify the scope to increase their sales in respective competitors’ locations by using numerical data in heatmaps.
The Best Practices of Heatmap
Select the right color palette:
The color is the primary element in this type of chart. So, it is crucial to select the correct color palette to match the data. Usually, the lighter color represents better results, and the darker color represents the worst case.
Always Include a legend:
The general rule for any graph is to include a legend, and it provides us the reference details.
Legend in the Heatmap is the color bar. The color bar shows the range of values with different densities of color.
Show the values in cells:
Displaying the values in each cell in the heat map is an excellent idea. It would be significantly easier to read each cell. Or else, we have to look at the color bar each time to see the value for the specific color.
Current State: Pandas Data Frame:
Let us start with the present scenario with data stored CSV file.
Suppose you have saved a CSV file of the list of the Country’s GDP growth rates for 12 years in your folder. With this data, you would like to analyze GDP performance for the various country during the pre-COVID and COVID eras.
You can convert raw data from the CSV file into Pandas Data Frame from the following code.
import pandas as pd #col-2,8,9,10,11,12,13,14,15 file="/Users/.../30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv" df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) #Code A pd.set_option("expand_frame_repr", False) #Code B df.dropna(inplace=True) #Code C print(df) #Code D
Let’s see what the above code does.
- Import the pandas module and change module name to
pd
for convenience. - Create a variable
file
and place the path of your CSV file. - Read the CSV file into a Data Frame using the
file
variable. - Extract only specific columns by the usecols method. Here you are required to extract only the country name and years from the 2013 to 2020 column only. The index of year column are 2,8,9,10,11,12,13,14,15.
- Specify the column names in the names list as shown in the above code snippet. Refer to Code A.
- Skip the first row of the table as this is the title of the table.
The title, the first row in the table, is already created in Code A. You have to skip the first row and extract the remaining rows into the table.
- Set
index_col=0
to highlight the first column as Row Heading, i.e., Country Name. - Use
nrows=20
to read only 20 rows of data. - Expand the DataFrame width to display all columns. See Code B.
- Remove the blank values. See Code C.
- Print the data frame. See Code D.
See below for output:
This table doesn’t make it easy to determine which country performed good or bad in GDP terms. You have to read through the values to find the best performer and worse performer. So, plotting the Heatmap is very helpful to identify the performance of each country’s GDP growth rate performance.
We can study four techniques to plot a heatmap using Python modules step by step.
Technique 1: Employ Seaborn heatmap()
Seaborn is a Python library to create data visualization graphical charts. It integrates with pandas data frames, and it has numerous customization features too. Michael Waskom, Ph.D., has created a seaborn Python library. With a few lines of Python code, we can get data Visualization graphical charts as per user requirements. Now is not required to plot graphs manually in excel as we can program it.
There are different types of graphical charts like scatter plot, line plot, histogram, bar plots, boxplots, violin plots, Scatterplot heatmap, and Heatmap.
In this technique, we can use the sea
born.heatmap() library to create Heatmap for analysis.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt #col-2,8,9,10,11,12,13,14,15,16 file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv" df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) pd.set_option("expand_frame_repr", False) df.dropna(inplace=True) s = sns.heatmap(df) #Code A plt.title("GDP Annual Growth Rate") # Code C plt.show() #Code B
Let us see how the above code works:
- Import pandas,
seaborn.heatmap()
, and matplotlib.pyplot
modules and create alias names. (Don’t forget to install these modules before importing.) - Creates Heatmap plot. Refer to the Code A command.
- Presents the title of the Heatmap plot. Refer to Code C.
- Presents Heatmap of List of Country’s GDP Growth rates in screen from the Code B.
See below for output:
Let us see how we can customize the heatmap using the following features:
1. anote=True
: displays respective values. The GDP rate of each country is displayed. You can read the GDP of the country without looking at the color bar. Refer to the code and Image below:
s = sns.heatmap(df,annot=True)
linewidth =.5
creates a line between cells. The line thickness is .5. Refer to code and Image where each cell is formatted.
s = sns.heatmap(df, linewidth =.5)
3. vmin
vmax
: This element changes the limit of values in the color map. Set vmin =1
and vmax=5
to display the color map for only that GDP Growth rate is between 1 to 5.
Refer to code and Image below:
s = sns.heatmap(df, vmin=1, vmax=5)
4. cmap= "YlGnBu"
can change the color of the heatmap using color code. In the image below, the color of the map is blue. Refer to the code and Image below:
s = sns.heatmap(df, cmap="YlGnBu")
5. linecolor
: changes the color of line. Refer to Code and Image below.
s = sns.heatmap(df,linewidth =.1,linecolor ="red")
Technique 2: Employ matplotlib.pyplot
Matplotlib is a Python package to create static, animated, and interactive visualization. John Hunter developed the Matplotlib Python library.
It has the flexibility to export the plot in any file format, and Customization of the plot is also possible in this module. The latest version of Matplotlib is 3.5.0, released on November 15, 2021. With Pyplot, we can automatically create figures and an axis with a few lines of code. But in the other method of the Matplotlib module, you have to write code to generate the x and y-axis. Matplotlib can plot basic arrays, statistics, and unstructured coordinate types.
In this technique, you can create Heatmap using matplotlib.pyplot
Python module.
import numpy as np import pandas as pd import matplotlib.pyplot as plt #col-2,8,9,10,11,12,13,14,15 file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) pd.set_option("expand_frame_repr", False) df.dropna(inplace=True) plt.pcolor(df) # Code A plt.yticks(np.arange(0.5, len(df.index), 1), df.index) # Code B plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns) # Code C plt.title("GDP Annual Growth Rate") plt.show() # Code D
The below points will show how the above code functions:
- Import the numpy, pandas, and
matplotlib.pyplot
module and create alias names. (Don’t forget to install these modules before importing.) - Creates a Heatmap plot from the data frame. Refer to Code A.
- In Code B, the
yticks
set the label and location of the y axis. Thenp.arange
method returns equally spaced values within intervals. In this case, the country’s name is labeled between the box usingnp.arange
anddf.index
with equal space. - Similarly, in Code C,
xticks
sets the label and location of the x-axis. The year label is displayed on the x-axis at an equal distance using thenp.arange
anddf.columns
method. - The Code D opens a new window and displays your Heatmap.
Customize the plots using the below elements.
1. edgecolors='yellow'
: changes the color of the cell border to yellow. We can change any color you prefer using the name blue or red. Refer to the code and Image below:
plt.pcolor(df,edgecolors='yellow')
2. cmap='RdBu'
: displays red and blue colors. Red represents worse results, and Blue represents the better result. Refer to the code and Image below:
plt.pcolor(df,cmap='RdBu')
3. vmin
, vmax
: changes the limit of values in the color map. Set vmin =-2
and vmax=1
to display only those GDP growth rates between -2 to 1 as per color shown in the color bar indicator. Refer to the code and Image below:
plt.pcolor(df,vmin=-2,vmax=1')
Technique 3: Employ plotly.express
Plotly is an open-source Python graphic library that creates superior interactive graphs. Also, it enables to development of web-based visualizations. With this module, we can make the basic chart, statistical chart, scientific chart, financial chart, maps, and 3d charts.
The plotly.express
module contains functions used to create most of the charts and graphs. It is an inbuilt library of the Plotly library.
Here will use the imshow
function to create a heatmap. Also, it displays image data.
import pandas as pd import plotly.express as px #col-2,8,9,10,11,12,13,14,15 file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv" df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) pd.set_option("expand_frame_repr", False) df.dropna(inplace=True) fig = px.imshow(df,labels=dict(x= "Year",color= "GDP%")) #Code A fig.layout.title = "GDP Annual Growth Rate" # Code B fig.show()
- Import pandas and
plotly.express
and create alias names. (Don’t forget to install these modules before importing.) - Create a Heatmap image with the imshow function. The labels for the x-axis and the name of the color bar are defined. Refer to Code A.
- Presents the title of the Heatmap Image from Code B.
- The
.show()
command will open a new browser to display Heatmap.
See below for output:
The following segment will show you the customization elements for this module.
color_continuous_scale="blue"
: displays the heatmap image in magenta color. Refer to code and Image 13 below:
We can change the color scale from the following lists:
['aggrnyl', 'agsunset', 'algae', 'amp', 'armyrose', 'balance', 'blackbody', 'bluered', 'blues', 'blugrn', 'bluyl', 'brbg', 'brwnyl', 'bugn', 'bupu', 'burg', 'burgyl', 'cividis', 'curl', 'darkmint', 'deep', 'delta', 'dense', 'earth', 'edge', 'electric', 'emrld', 'fall', 'geyser', 'gnbu', 'gray', 'greens', 'greys', 'haline', 'hot', 'hsv', 'ice', 'icefire', 'inferno', 'jet', 'magenta', 'magma', 'matter', 'mint', 'mrybm', 'mygbm', 'oranges', 'orrd', 'oryel', 'oxy', 'peach', 'phase', 'picnic', 'pinkyl', 'piyg', 'plasma', 'plotly3', 'portland', 'prgn', 'pubu', 'pubugn', 'puor', 'purd', 'purp', 'purples', 'purpor', 'rainbow', 'rdbu', 'rdgy', 'rdpu', 'rdylbu', 'rdylgn', 'redor', 'reds', 'solar', 'spectral', 'speed', 'sunset', 'sunsetdark', 'teal', 'tealgrn', 'tealrose', 'tempo', 'temps', 'thermal', 'tropic', 'turbid', 'turbo', 'twilight', 'viridis', 'ylgn', 'ylgnbu', 'ylorbr', 'ylorrd']
fig = px.imshow(df,labels=dict(x= "Year",color= "GDP%"), color_continuous_scale= "magenta")
fig.update_layout(coloraxis_showscale=False)
: the color scale will disappear.fig.update_xaxes(showticklabels=False)
: The x axis will not be displayed.fig.update_yaxes(showticklabels=False)
: The y axis labels will not be displayed.
Technique 4: Employ Clustergrammer
Clustergrammer is a web-based tool for visualizing 2D, 3D dimensional data, and it is an interactive Heatmap. This package uses javascript and Python languages.
The Ma’ayan lab discovered this tool at the Icahn School of Medicine at Mount Sinai. The library is free and open-source. The output works only in Jupyter notebooks
To use Clustergrammer, install the following packages:
1. Jupyter notebook,
pip install notebook
2. Jupyter Widget Dependencies (Numpy, Scipy, Pandas)
pip install numpy pip install scipy pip install pandas
3. Ipywidgets
pip install ipywidgets
The clustergrammer can be installed and enabled using the following commands:
pip install clustergrammer2 jupyter nbextension install --py --sys-prefix clustergrammer2 jupyter nbextension enable --py --sys-prefix clustergrammer2
The below code will create heatmap using clustergrammer2
import numpy as np import pandas as pd from clustergrammer2 import net #col-2,8,9,10,11,12,13,14,15 file="/Users/.../Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv" df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) pd.set_option("expand_frame_repr", False) df.dropna(inplace=True) # load DataFrame net.load_df(df) # Code A # cluster using default parameters net.cluster() # Code B # make interactive widget net.widget() #Code C
Let us see what the above code does:
- Import numpy, pandas, and clustergrammer2 library and create alias name.
- Load the data frame to the network object. The net object can load data, filter, normalize, cluster, and render the widget. Refer to the Code A.
- Cluster the data using default parameters. Refer to the Code B.
- Make Interactive widget from Code C.
The interactive widget will be displayed in the jupyter notebook and not on the idle screen.
Let us learn how to work with a jupyter notebook in this section.
In the terminal window, type the following command:
jupyter notebook
A new browser window will open where you can access the jupyter notebook package.
In the right-most corner, click New Menu, then click Python 3, as shown below Image.
New Python will window opens as shown Image below:
Paste the code in the input screen as shown below and click the Run button.
You can see the heatmap widget below:
The Clustergrammer following interact features to work with:
Summary
Heatmap, the data visualization tool, is helpful in quickly interpreting the data. Each value represents each cell or box, and it plots with a light color to dark color. The higher density of color shows the worse, and the low density of color is better. There are four libraries to plot heatmaps in Python, which are seaborn.heatmap()
, matplotlib.pyplot
, ploty.express
, and clustergrammer
.
The best is seaborn.heatmap()
module, as the code is shorter and easier to understand. But other modules have their benefits and numerous features.
So now you are familiar with creating super Heatmap in Python using various modules. Now start to take action to make Heatmap using all modules and give me your feedback at thoufeeq87.mtr (at) gmail (dot) com.