How to make Heatmap using Pandas DataFrame?

Data Visualization is a process of converting raw data to graphical representation.

It is so essential for businesses to assess the current trends and patterns. And it also helps management to make decisions faster. The data presented through color, density, size, and shapes enables us to observe the information quickly. And then, you can conclude the present scenario promptly. Data is visualized with numerous tools like scatter plots, Mekko charts, heat maps, bubble clouds, Venn diagrams, and more.

You have a set of data arranged in a data frame in Python. As you understand how data Visualization is essential, you wonder how to plot these data using the Heatmap tool in Python.  Do you know what Python modules to use for creating Heatmap?

This in-depth article will initially explain what Heatmap is, its benefits, and its best practices.  Then we will show you four different techniques to plot Heatmap using Python Libraries.

We assume that you have basic knowledge of Python. And Python is installed in the system.

What is Heatmap?

Heatmap is a graphical representation of data using different colors where the color represents values. Most real estate, engineering, marketing, pharmaceutical, and research sectors use Heatmap for data analysis. Heatmaps are the best tool for visualizing complex and simple information compared to charts or tables. For example, Businesses use Heatmap to visually analyze their sales, raw materials usage, and financial data.

Image 1

Why should you use Heatmap?

Heatmap provides endless benefits in analyzing data for businesses and organizations.

These benefits are :

  • Enhances communication:  Heatmap is a more effective tool to communicate the business’s current financial or operational situation. And provide us with information for improvements to be made.
  • Enhances Time Based Trend Analysis:  The most extraordinary feature about Heatmap can convey timely changes using visual representation. Organizations can see improvement or decline in their sales or other data over time and in which locations. It helps companies to decide on sales and marketing efforts accordingly.
  • Enhances Competitive Advantage:  Heatmaps can help us to study the competitive landscape of the market. Businesses can identify the scope to increase their sales in respective competitors’ locations by using numerical data in heatmaps.

The Best Practices of Heatmap

Select the right color palette:

The color is the primary element in this type of chart. So, it is crucial to select the correct color palette to match the data. Usually, the lighter color represents better results, and the darker color represents the worst case.

Always Include a legend:

The general rule for any graph is to include a legend, and it provides us the reference details.

Legend in the Heatmap is the color bar. The color bar shows the range of values with different densities of color.

Show the values in cells:

Displaying the values in each cell in the heat map is an excellent idea. It would be significantly easier to read each cell. Or else, we have to look at the color bar each time to see the value for the specific color.

Current State: Pandas Data Frame:

Let us start with the present scenario with data stored CSV file.

Suppose you have saved a CSV file of the list of the Country’s GDP growth rates for 12 years in your folder.  With this data, you would like to analyze GDP performance for the various country during the pre-COVID and COVID eras.

You can convert raw data from the CSV file into Pandas Data Frame from the following code.

import pandas as pd
#col-2,8,9,10,11,12,13,14,15
file="/Users/.../30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"
df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20) #Code A
pd.set_option("expand_frame_repr", False) #Code B
df.dropna(inplace=True) #Code C
print(df) #Code D

Let’s see what the above code does.

  1. Import the pandas module and change module name to pd for convenience.
  2. Create a variable file and place the path of your CSV file.
  3. Read the CSV file into a Data Frame using the file variable.
  4. Extract only specific columns by the usecols method. Here you are required to extract only the country name and years from the 2013 to 2020 column only. The index of year column are 2,8,9,10,11,12,13,14,15.
  5. Specify the column names in the names list as shown in the above code snippet. Refer to Code A.
  6. Skip the first row of the table as this is the title of the table.

The title, the first row in the table, is already created in Code A. You have to skip the first row and extract the remaining rows into the table.

  • Set index_col=0 to highlight the first column as Row Heading, i.e., Country Name.
  • Use nrows=20 to read only 20 rows of data.
  • Expand the DataFrame width to display all columns. See Code B.
  • Remove the blank values. See Code C.
  • Print the data frame. See Code D.

See below for output:

Image 2

This table doesn’t make it easy to determine which country performed good or bad in GDP terms. You have to read through the values to find the best performer and worse performer. So, plotting the Heatmap is very helpful to identify the performance of each country’s GDP growth rate performance.

 We can study four techniques to plot a heatmap using Python modules step by step.

Technique 1:  Employ Seaborn heatmap()

Seaborn is a Python library to create data visualization graphical charts. It integrates with pandas data frames, and it has numerous customization features too. Michael Waskom, Ph.D., has created a seaborn Python library. With a few lines of Python code, we can get data Visualization graphical charts as per user requirements. Now is not required to plot graphs manually in excel as we can program it. 

There are different types of graphical charts like scatter plot, line plot, histogram, bar plots, boxplots, violin plots, Scatterplot heatmap, and Heatmap.

In this technique, we can use the seaborn.heatmap() library to create Heatmap for analysis.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#col-2,8,9,10,11,12,13,14,15,16
file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"
df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20)
pd.set_option("expand_frame_repr", False)
df.dropna(inplace=True)
s = sns.heatmap(df) #Code A
plt.title("GDP Annual Growth Rate") # Code C
plt.show() #Code B

Let us see how the above code works:

  1. Import pandas, seaborn.heatmap(), and matplotlib.pyplot modules and create alias names. (Don’t forget to install these modules before importing.)
  2. Creates Heatmap plot. Refer to the Code A command.
  3. Presents the title of the Heatmap plot. Refer to Code C.
  4. Presents Heatmap of List of Country’s GDP Growth rates in screen from the Code B.

See below for output:

Image 3

Let us see how we can customize the heatmap using the following features:

1. anote=True: displays respective values. The GDP rate of each country is displayed. You can read the GDP of the country without looking at the color bar. Refer to the code and Image below:

s = sns.heatmap(df,annot=True)
Image 4

linewidth =.5  creates a line between cells. The line thickness is .5. Refer to code and Image where each cell is formatted.

s = sns.heatmap(df, linewidth =.5)
Image 5

3. vmin vmax: This element changes the limit of values in the color map. Set vmin =1 and vmax=5 to display the color map for only that GDP Growth rate is between 1 to 5.

Refer to code and Image below:

s = sns.heatmap(df, vmin=1, vmax=5)
Image 6

4. cmap= "YlGnBu" can change the color of the heatmap using color code. In the image below, the color of the map is blue. Refer to the code and Image below:

s = sns.heatmap(df, cmap="YlGnBu")
Image 7

5. linecolor: changes the color of line. Refer to Code and Image below.

s = sns.heatmap(df,linewidth =.1,linecolor ="red")
Image 8

Technique 2:  Employ matplotlib.pyplot

Matplotlib is a Python package to create static, animated, and interactive visualization. John Hunter developed the Matplotlib Python library.

It has the flexibility to export the plot in any file format, and Customization of the plot is also possible in this module. The latest version of Matplotlib is 3.5.0, released on November  15, 2021. With Pyplot, we can automatically create figures and an axis with a few lines of code. But in the other method of the Matplotlib module, you have to write code to generate the x and y-axis. Matplotlib can plot basic arrays, statistics, and unstructured coordinate types.

In this technique, you can create Heatmap using matplotlib.pyplot Python module.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#col-2,8,9,10,11,12,13,14,15
file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20)

pd.set_option("expand_frame_repr", False)
df.dropna(inplace=True)
plt.pcolor(df) # Code A
plt.yticks(np.arange(0.5, len(df.index), 1), df.index) # Code B
plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns) # Code C
plt.title("GDP Annual Growth Rate")
plt.show() # Code D

The below points will show how the above code functions:

  1. Import the numpy, pandas, and matplotlib.pyplot module and create alias names. (Don’t forget to install these modules before importing.)
  2. Creates a Heatmap plot from the data frame. Refer to Code A.
  3. In Code B, the yticks set the label and location of the y axis. The np.arange method returns equally spaced values within intervals. In this case, the country’s name is labeled between the box using np.arange and df.index with equal space.
  4. Similarly, in Code C, xticks sets the label and location of the x-axis. The year label is displayed on the x-axis at an equal distance using the np.arange and df.columns method.
  5. The Code D opens a new window and displays your Heatmap.

 Customize the plots using the below elements.

1. edgecolors='yellow': changes the color of the cell border to yellow. We can change any color you prefer using the name blue or red. Refer to the code and Image below:

plt.pcolor(df,edgecolors='yellow')
Image 9

2. cmap='RdBu': displays red and blue colors. Red represents worse results, and Blue represents the better result. Refer to the code and Image below:

plt.pcolor(df,cmap='RdBu')
Image 10

3. vmin, vmax:  changes the limit of values in the color map. Set vmin =-2 and vmax=1 to display only those GDP growth rates between -2 to 1 as per color shown in the color bar indicator. Refer to the code and Image below:

plt.pcolor(df,vmin=-2,vmax=1')
Image 11

Technique 3:  Employ plotly.express

Plotly is an open-source Python graphic library that creates superior interactive graphs. Also, it enables to development of web-based visualizations. With this module, we can make the basic chart, statistical chart, scientific chart, financial chart, maps, and 3d charts.

The plotly.express module contains functions used to create most of the charts and graphs.  It is an inbuilt library of the Plotly library.

Here will use the imshow function to create a heatmap. Also, it displays image data.

import pandas as pd
import plotly.express as px
#col-2,8,9,10,11,12,13,14,15
file="/Users/mohamedthoufeeq/Downloads/Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"
df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20)
pd.set_option("expand_frame_repr", False)
df.dropna(inplace=True)
fig = px.imshow(df,labels=dict(x= "Year",color= "GDP%")) #Code A
fig.layout.title = "GDP Annual Growth Rate" # Code B
fig.show()
  1. Import pandas and plotly.express and create alias names. (Don’t forget to install these modules before importing.)
  2. Create a Heatmap image with the imshow function. The labels for the x-axis and the name of the color bar are defined. Refer to Code A.
  3. Presents the title of the Heatmap Image from Code B.
  4. The .show() command will open a new browser to display Heatmap.

See below for output:

Image 12

The following segment will show you the customization elements for this module.

color_continuous_scale="blue": displays the heatmap image in magenta color. Refer to  code and Image 13 below:

We can change the color scale from the following lists:

['aggrnyl', 'agsunset', 'algae', 'amp', 'armyrose', 'balance',
'blackbody', 'bluered', 'blues', 'blugrn', 'bluyl', 'brbg',
'brwnyl', 'bugn', 'bupu', 'burg', 'burgyl', 'cividis', 'curl',
'darkmint', 'deep', 'delta', 'dense', 'earth', 'edge', 'electric',
'emrld', 'fall', 'geyser', 'gnbu', 'gray', 'greens', 'greys',
'haline', 'hot', 'hsv', 'ice', 'icefire', 'inferno', 'jet',
'magenta', 'magma', 'matter', 'mint', 'mrybm', 'mygbm', 'oranges',
'orrd', 'oryel', 'oxy', 'peach', 'phase', 'picnic', 'pinkyl',
'piyg', 'plasma', 'plotly3', 'portland', 'prgn', 'pubu', 'pubugn',
'puor', 'purd', 'purp', 'purples', 'purpor', 'rainbow', 'rdbu',
'rdgy', 'rdpu', 'rdylbu', 'rdylgn', 'redor', 'reds', 'solar',
'spectral', 'speed', 'sunset', 'sunsetdark', 'teal', 'tealgrn',
'tealrose', 'tempo', 'temps', 'thermal', 'tropic', 'turbid',
'turbo', 'twilight', 'viridis', 'ylgn', 'ylgnbu', 'ylorbr',
'ylorrd']
fig = px.imshow(df,labels=dict(x= "Year",color= "GDP%"), color_continuous_scale= "magenta")
Image 13

  • fig.update_layout(coloraxis_showscale=False): the color scale will disappear.
  • fig.update_xaxes(showticklabels=False): The x axis will not be displayed.
  • fig.update_yaxes(showticklabels=False): The y axis labels will not be displayed.

Technique 4:  Employ Clustergrammer

Clustergrammer is a web-based tool for visualizing 2D, 3D dimensional data, and it is an interactive Heatmap. This package uses javascript and Python languages.

The Ma’ayan lab discovered this tool at the Icahn School of Medicine at Mount Sinai. The library is free and open-source. The output works only in Jupyter notebooks

To use Clustergrammer, install the following packages:

1. Jupyter notebook,

pip install notebook

2. Jupyter Widget Dependencies (Numpy, Scipy, Pandas)

pip install numpy
pip install scipy
pip install pandas

3. Ipywidgets

pip install ipywidgets

The clustergrammer can be installed and enabled using the following commands:

pip install clustergrammer2
jupyter nbextension install --py --sys-prefix clustergrammer2
jupyter nbextension enable --py --sys-prefix clustergrammer2

The below code will create heatmap using clustergrammer2

import numpy as np
import pandas as pd
from clustergrammer2 import net

#col-2,8,9,10,11,12,13,14,15
file="/Users/.../Data_Extract_From_World_Development_Indicators/30266bf0-d3a6-440a-ae25-f0d47350d321_Data.csv"
df = pd.read_csv(file,usecols=[2,8,9,10,11,12,13,14,15],names =["Country Name","[YR2013]","[YR2014]","[YR2015]","[YR2016]","[YR2017]","[YR2018]","[YR2019]","[YR2020]"],skiprows=1,index_col=0,nrows = 20)

pd.set_option("expand_frame_repr", False)
df.dropna(inplace=True)

# load DataFrame
net.load_df(df) # Code A

# cluster using default parameters
net.cluster() # Code B

# make interactive widget
net.widget() #Code C

Let us see what the above code does:

  1. Import numpy, pandas, and clustergrammer2 library and create alias name.
  2. Load the data frame to the network object. The net object can load data, filter, normalize, cluster, and render the widget. Refer to the Code A.
  3. Cluster the data using default parameters. Refer to the Code B.
  4. Make Interactive widget from Code C.

The interactive widget will be displayed in the jupyter notebook and not on the idle screen.

Let us learn how to work with a jupyter notebook in this section.

In the terminal window, type the following command:

jupyter notebook

A new browser window will open where you can access the jupyter notebook package.

In the right-most corner, click New Menu, then click Python 3, as shown below Image.

Image 14

New Python will window opens as shown Image below:

Image 15

Paste the code in the input screen as shown below and click the Run button.

Image 16

You can see the heatmap widget below:

Image 17

The Clustergrammer following interact features to work with:

  1. Zooming and panning.
  2. Row and Column Reordering.
  3. Cropping.
  4. Row Searching.

Summary

Heatmap, the data visualization tool, is helpful in quickly interpreting the data. Each value represents each cell or box, and it plots with a light color to dark color. The higher density of color shows the worse, and the low density of color is better.   There are four libraries to plot heatmaps in Python, which are seaborn.heatmap(), matplotlib.pyplot, ploty.express, and clustergrammer.

The best is seaborn.heatmap() module, as the code is shorter and easier to understand. But other modules have their benefits and numerous features.

So now you are familiar with creating super Heatmap in Python using various modules. Now start to take action to make Heatmap using all modules and give me your feedback at thoufeeq87.mtr (at) gmail (dot) com.