Pandas Plotting – Part 19

Rate this post

The Pandas DataFrame/Series has several methods related to plotting.

This is Part 19 of the series regarding the DataFrame methods.

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
  • Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
  • Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
  • Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
  • Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
  • Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
  • Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
  • Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
  • Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
  • Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
  • Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
  • Part 21 focuses on converting one data type to/from another data type.
  • Part 22 focuses on converting one data type to/from another data type.
  • Part 23 focuses on converting one data type to/from another data type.
  • Part 24 focuses on converting one data type to/from another data type.
  • Part 25 focuses on converting one data type to/from another data type.

Preparation

Before any data manipulation can occur, three (3) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Matplotlib library displays a visual graph of a plotted dataset.
  • The Scipy library allows users to manipulate and visualize the data.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install matplotlib

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install scipy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import matplotlib.pyplot as plt
import scipy

DataFrame Plot

The plot() method creates visual graphs based on a dataset of a DataFrame or Series.

The syntax for this method is as follows:

DataFrame.plot(*args, **kwargs)
ParameterDescription
dataThis parameter is a DataFrame/Series dataset.
xThis parameter is a label/position (for a DataFrame only).
kindThis parameter is a string and indicates the type of plot to create:
'line': default is this option
'density': same as ‘KDE’
β€˜bar’: vertical bar chart
'area': area plot
β€˜barh’: horizontal bar chart
'pie': pie plot
β€˜hist’: histogram
'scatter': scatter plot (DataFrame)
β€˜box’: boxplot
'hexbin': hexbin plot (DataFrame)
β€˜kde’: Kernel Density plot
axThis parameter is the Matplotlib axis object.
subplotsThis parameter makes subplots for each column separately.
sharexIf subplots, share x-axis and set some x-axis labels to invisible.
shareyIf subplots, share the y-axis and set some y-axis labels to invisible.
layoutA tuple that determines the row/column layout for subplots.
figsizeThis parameter sets the size (width and height) of the figure.
use_indexUse the index as ticks for the x-axis.
titleThe heading to use for the plot (graph).
gridThese are the axis grid lines.
legendDisplay legend on the axis subplots. Displays by default (True).
styleThe line style per column (matplotlib).
logxUse log/symlog scaling on the x-axis.
logyUse log/symlog scaling on the y-axis.
loglogUse log/symlog scaling on both the x-axis and y-axis.
xticksThe value to use for xticks.
yticksThe value to use for yticks.
xlimSet the x limits of the current axis.
ylimSet the y limits of the current axis.
xlabelName for the x-axis.
ylabelName for the y-axis.
rotThe rotation for ticks (xticks vertical/yticks horizontal).
fontsizeThe size of the font to use for both xticks/yticks.
colormapThis parameter is the color map to select specific colors.
positionThese are the alignments for the bar plot.
tableIf True, create a table using DataFrame data. This data will transpose to the matplotlib default layout.
yerrSee plotting with Error Bars.
xerrSee plotting with Error Bars.
stackedIf set to True, create a stacked plot.
sort_columnsThis parameter sorts the column name(s) for plot ordering.
secondary_yThis parameter determines if it plots on the secondary y-axis.
mark_rightIf set determines if using a secondary_y axis auto marks the column labels with right in the legend.
include_boolIf set to True, Boolean values will be available to plot.
backendThis parameter determines the backend to use instead of the option plotting.backend.
**kwargsThis parameter is the option(s) passed to the matplotlib library.

This example reads in the countries.csv file and plots the Country, Population, and Area columns on a Line chart.

πŸ’‘ Note: Click here to download this file. Move it to the current working directory,

df = pd.read_csv('countries.csv')
ax = plt.gca()

df.plot(kind='line', x='Country', y='Population', 
        title='Sample Countries', fontsize=8, ax=ax)
df.plot(kind='line',x='Country', y='Area', ax=ax)
plt.savefig('plot_line.png')
plt.show()
  • Line [1] reads in a comma-delimited CSV file and saves it to a DataFrame (df).
  • Line [2] gets the current axes (gca()) and saves it to ax.
  • Line [3] does the following:
    • sets the kind parameter to a Line chart
    • sets the columns to Country and Population
    • sets the title and font size
    • sets the ax variable created above
  • Line [4] does the following:
    • sets the kind parameter to a Line chart
    • sets the columns to Country and Area
    • sets the ax variable created above
  • Line [5] saves the Line chart as an image file and places this file in the current working directory.
  • Line [6] displays the Line chart on-screen.

πŸ’‘Β Note: The gca() method gets the current axes for the figure matching **kwargs, or creates a new one.

Output – On-Screen

The buttons on the bottom left can be used to further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is to use the plot.line() method.


DataFrame Plot Area

The DataFrame.plot.area() method creates a stacked Area plot chart.

The syntax for this method is as follows:

DataFrame.plot.area(x=None, y=None, **kwargs)
xThis parameter determines the coordinates for the x-axis.
The default value is the index.
yThis parameter specifies the coordinates for the y axis.
The default value is the columns.
**kwargsAdditional keywords are outlined above in the plot method.

For this example, Rivers Clothing would like to plot an Area chart indicating Sales, New Customers, and Unique Visits to their online store over six (6) months.

df = pd.DataFrame({'Sales':    [3, 2, 3, 9, 10, 6],
                  'New-Custs': [7, 7, 6, 11, 17, 13],
                  'Visits':    [19, 41, 26, 61, 71, 60]},
index=pd.date_range(start='2022/01/01', end='2022/07/01', freq='M'))
ax = plt.gca()
df.plot.area(title='Sales Stats - 6 Months', fontsize=8, ax=ax)
plt.show()
  • Line [1] creates a DataFrame from a dictionary of lists. This output saves to df.
  • Line [2] creates an index based on a date range and frequency.
  • Line [3] Gets the current access (gca()) and saves it to ax.
  • Line [4] does the following:
    • creates the Area chart
    • sets the title and font size
    • sets the ax variable created above
  • Line [5] outputs the Area chart on-screen.

Output

The buttons on the bottom left can be used to further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'area' option.


DataFrame Vertical Bar

The pandas.DataFrame.plot.bar() method is a Vertical Bar chart representing data with rectangular bars. The lengths (height) of these bars define the values they represent.

The syntax for this method is as follows:

DataFrame.plot.bar(x=None, y=None, **kwargs)
ParameterDescription
xThis parameter determines the coordinates for the x-axis. Default is the index.
yThis parameter determines the coordinates for the y axis. Default is columns.
colorThis parameter can be a string, an array, or a dictionary to signify color(s).
– A single color can be specified by name, RGB or RGBA
– A color sequence specified by name, RGB, or RGBA.
– A dict of the form (col name/color) so each column is colored differently.
**kwargsAdditional keywords are outlined above in the plot() method.

Rivers Clothing would like a Vertical Bar chart of its sales based on sizes sold over the past six (6) months.

df = pd.DataFrame({'Tops':   [40, 12, 10, 26, 36],
                   'Pants':  [19, 8, 30, 21, 38],
                   'Coats':  [10, 10, 42, 17, 37]}, 
                    index=['XS', 'S', 'M', 'L', 'XL'])
ax = plt.gca()

df.plot.bar(ax=ax)
plt.title('Rivers Clothing - Sold')
plt.xlabel('Sizes')
plt.ylabel('Sold')
plt.show()

Output

The buttons on the bottom left can be used to further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'bar' option.


DataFrame Horizontal Bar

The pandas.DataFrame.plot.barh() method is a Horizontal Bar representing data with rectangular bars. The lengths (height) of these bars define the values they represent.

The syntax for this method is as follows:

DataFrame.plot.barh(x=None, y=None, **kwargs)
ParameterDescription
xThis parameter determines the coordinates for the x-axis. Default is the index.
yThis parameter determines the coordinates for the y axis. Default is columns.
colorThis parameter can be a string, an array, or a dictionary to signify color(s).
– A single color can be specified by name, RGB or RGBA
– A color sequence specified by name, RGB, or RGBA.
– A dict of the form (col name/color) so each column is colored differently.
**kwargsAdditional keywords are outlined above in the plot() method.

Rivers Clothing would like a Horizontal Bar chart of its sales based on sizes sold over the past six (6) months.

custom_colors = {'Tops': '#8A2BE2', 'Pants': '#6495ED', 'Coats': '#E6E6FA'}

df = pd.DataFrame({'Tops':   [40, 12, 10, 26, 36],
                   'Pants':  [19, 8, 30, 21, 38],
                   'Coats':  [10, 10, 42, 17, 37]}, 
                   index=['XS', 'S', 'M', 'L', 'XL'])
ax = plt.gca()
df.plot.barh(color=custom_colors, ax=ax)
plt.title('Rivers Clothing - Sold')
plt.xlabel('Sizes')
plt.ylabel('Sold')
plt.show()
  • Line  [1] creates a list of color selections for the three (3) bars and saves them to custom_colors.
  • Line [2] Gets the Current Access (gca()) and saves it to ax.
  • Line [3] creates the Horizontal Bar chart using custom_colors and the ax variable created above.
  • Line [4-6] sets the title and labels.
  • Line [7] outputs the Horizontal Bar chart on-screen.

Output

The buttons on the bottom left can be used to further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'barh' option.


DataFrame Plot Box

The dataframe.plot.box() method creates a Box-and-Whisker plot from the DataFrame column(s). In short, this type of plot encapsulates the minimum, first quarter, median, third quarter, and maximum values of a dataset.

For a detailed definition of a Box plot, click here.

The syntax for this method is as follows:

DataFrame.plot.box(by=None, **kwargs)
ParameterDescription
byThis parameter is a string and signifies the column to group the DataFrame.
**kwargsThe keyword arguments for the method

For this example, Rivers Clothing requires a Box plot. This documents how its stock is performing on the Stock Exchange. The stock prices are reviewed twice a day for three (3) days in January (1st, 15th, and 30th).

stock_dates  = ['Jan-01', 'Jan-01', 'Jan-15', 'Jan-15', 'Jan-30', 'Jan-30']
stock_prices = [3.34, 1.99, 2.25, 4.57, 5.74, 3.65]
ax = plt.gca()

df = pd.DataFrame({'Stock Date':  stock_dates, 'Stock Price': stock_prices})
boxplot = df.boxplot(column=['Stock Price'], by='Stock Date', grid=True, rot=30, fontsize=10, ax=ax)
plt.show()
  • Line [1] creates a list of dates and saves them to stock_dates.
  • Line [2] Gets the current access (gca()) and saves it to ax.
  • Line [3] creates a list of stock prices and saves to stock_prices.
  • Line [4] creates a DataFrame from the variables saved above.
  • Line [5] does the following:
    • Creates the Box chart based on the Stock Prices and Dates.
    • Displays the grid lines on the chart.
    • Rotates the date labels at the chart bottom by 30 degrees.
    • Sets the font size to 10.
    • Sets the ax created above.
  • Line [6] outputs the Box chart on-screen.

The buttons on the bottom left can be used to further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'box' option.


DataFrame Plot Density

The dataframe.plot.density() method generates Kernel Density Estimate (KDE) plots using Gaussian kernels.

Direct Quote from Wikipedia:

In statisticskernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample

from Wikipedia

The syntax for this method is as follows:

DataFrame.plot.density(bw_method=None, ind=None, **kwargs)
ParameterDescription
bw_methodThis parameter calculates the bandwidth. This parameter can be: 'scott', 'silverman', scalar, or callable. Click here for details.
indThis parameter is the evaluation point for a PDF. If empty, 100 equally spaced points are assumed.
**kwargsThe keyword arguments for this method are outlined in the plot method.

For this example, a KDE chart plots the number of students who attended Grades 10 and 11 at Simms High School over the previous ten (10) years.

df = pd.DataFrame({
'Grade-10':  [12, 11, 13, 14, 17, 11, 18, 29, 47, 76],
'Grade-11':  [11, 16, 15, 28, 35, 36, 61, 68, 59, 67]})
ax = plt.gca()

df.plot.kde(title="KDE - Students - Previous 10 Years", ax=ax)
plot.show()
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df.
  • Line [2] Gets the current access (gca()) and saves it to ax.
  • Line [3] creates a KDE chart and sets the chart title.
  • Line [4] outputs the KDE chart on-screen.

Output

The buttons on the bottom left can further manipulate the chart.

πŸ’‘ Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'kde' option.