Pandas Plotting - plot() & plot.area() - Be on the Right Side of Change

The Pandas DataFrame/Series has several methods related to plotting.

Preparation

Before any data manipulation can occur, three (3) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The Matplotlib library displays a visual graph of a plotted dataset.
The Scipy library allows users to manipulate and visualize the data.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install matplotlib

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install scipy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required libraries.

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import matplotlib.pyplot as plt
import scipy

DataFrame Plot

The plot() method creates visual graphs based on a dataset of a DataFrame or Series.

The syntax for this method is as follows:

DataFrame.plot(*args, **kwargs)

Parameter	Description
`data`	This parameter is a DataFrame/Series dataset.
`x`	This parameter is a label/position (for a DataFrame only).
`kind`	This parameter is a string and indicates the type of plot to create: `'line'`: default is this option `'density'`: same as ‘KDE’ `‘bar’`: vertical bar chart `'area'`: area plot `‘barh’`: horizontal bar chart `'pie'`: pie plot `‘hist’`: histogram `'scatter'`: scatter plot (DataFrame) `‘box’`: boxplot `'hexbin'`: hexbin plot (DataFrame) `‘kde’`: Kernel Density plot
`ax`	This parameter is the Matplotlib axis object.
`subplots`	This parameter makes subplots for each column separately.
`sharex`	If subplots, share x-axis and set some x-axis labels to invisible.
`sharey`	If subplots, share the y-axis and set some y-axis labels to invisible.
`layout`	A tuple that determines the row/column layout for subplots.
`figsize`	This parameter sets the size (width and height) of the figure.
`use_index`	Use the index as ticks for the x-axis.
`title`	The heading to use for the plot (graph).
`grid`	These are the axis grid lines.
`legend`	Display legend on the axis subplots. Displays by default (`True`).
`style`	The line style per column (matplotlib).
`logx`	Use log/symlog scaling on the x-axis.
`logy`	Use log/symlog scaling on the y-axis.
`loglog`	Use log/symlog scaling on both the x-axis and y-axis.
`xticks`	The value to use for xticks.
`yticks`	The value to use for yticks.
`xlim`	Set the x limits of the current axis.
`ylim`	Set the y limits of the current axis.
`xlabel`	Name for the x-axis.
`ylabel`	Name for the y-axis.
`rot`	The rotation for ticks (xticks vertical/yticks horizontal).
`fontsize`	The size of the font to use for both xticks/yticks.
`colormap`	This parameter is the color map to select specific colors.
`position`	These are the alignments for the bar plot.
`table`	If True, create a table using DataFrame data. This data will transpose to the matplotlib default layout.
`yerr`	See plotting with Error Bars.
`xerr`	See plotting with Error Bars.
`stacked`	If set to `True`, create a stacked plot.
`sort_columns`	This parameter sorts the column name(s) for plot ordering.
`secondary_y`	This parameter determines if it plots on the secondary y-axis.
`mark_right`	If set determines if using a secondary_y axis auto marks the column labels with right in the legend.
`include_bool`	If set to `True`, Boolean values will be available to plot.
`backend`	This parameter determines the backend to use instead of the option `plotting.backend`.
`**kwargs`	This parameter is the option(s) passed to the matplotlib library.

This example reads in the countries.csv file and plots the Country, Population, and Area columns on a Line chart.

💡 Note: Click here to download this file. Move it to the current working directory,

df = pd.read_csv('countries.csv')
ax = plt.gca()

df.plot(kind='line', x='Country', y='Population', 
        title='Sample Countries', fontsize=8, ax=ax)
df.plot(kind='line',x='Country', y='Area', ax=ax)
plt.savefig('plot_line.png')
plt.show()

Line [1] reads in a comma-delimited CSV file and saves it to a DataFrame (df).
Line [2] gets the current axes (gca()) and saves it to ax.
Line [3] does the following:
- sets the kind parameter to a Line chart
- sets the columns to Country and Population
- sets the title and font size
- sets the ax variable created above
Line [4] does the following:
- sets the kind parameter to a Line chart
- sets the columns to Country and Area
- sets the ax variable created above
Line [5] saves the Line chart as an image file and places this file in the current working directory.
Line [6] displays the Line chart on-screen.

💡 Note: The gca() method gets the current axes for the figure matching **kwargs, or creates a new one.

Output – On-Screen

The buttons on the bottom left can be used to further manipulate the chart.

💡 Note: Another way to create this chart is to use the plot.line() method.

DataFrame Plot Area

The DataFrame.plot.area() method creates a stacked Area plot chart.

The syntax for this method is as follows:

DataFrame.plot.area(x=None, y=None, **kwargs)

`x`	This parameter determines the coordinates for the x-axis. The default value is the index.
`y`	This parameter specifies the coordinates for the y axis. The default value is the columns.
`**kwargs`	Additional keywords are outlined above in the `plot` method.

For this example, Rivers Clothing would like to plot an Area chart indicating Sales, New Customers, and Unique Visits to their online store over six (6) months.

df = pd.DataFrame({'Sales':    [3, 2, 3, 9, 10, 6],
                  'New-Custs': [7, 7, 6, 11, 17, 13],
                  'Visits':    [19, 41, 26, 61, 71, 60]},
index=pd.date_range(start='2022/01/01', end='2022/07/01', freq='M'))
ax = plt.gca()
df.plot.area(title='Sales Stats - 6 Months', fontsize=8, ax=ax)
plt.show()

Line [1] creates a DataFrame from a dictionary of lists. This output saves to df.
Line [2] creates an index based on a date range and frequency.
Line [3] Gets the current access (gca()) and saves it to ax.
Line [4] does the following:
- creates the Area chart
- sets the title and font size
- sets the ax variable created above
Line [5] outputs the Area chart on-screen.

Output

The buttons on the bottom left can be used to further manipulate the chart.

💡 Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'area' option.

DataFrame Vertical Bar

The pandas.DataFrame.plot.bar() method is a Vertical Bar chart representing data with rectangular bars. The lengths (height) of these bars define the values they represent.

The syntax for this method is as follows:

DataFrame.plot.bar(x=None, y=None, **kwargs)

Parameter	Description
`x`	This parameter determines the coordinates for the x-axis. Default is the index.
`y`	This parameter determines the coordinates for the y axis. Default is columns.
`color`	This parameter can be a string, an array, or a dictionary to signify color(s). – A single color can be specified by name, RGB or RGBA – A color sequence specified by name, RGB, or RGBA. – A dict of the form (col name/color) so each column is colored differently.
`**kwargs`	Additional keywords are outlined above in the `plot()` method.

Rivers Clothing would like a Vertical Bar chart of its sales based on sizes sold over the past six (6) months.

df = pd.DataFrame({'Tops':   [40, 12, 10, 26, 36],
                   'Pants':  [19, 8, 30, 21, 38],
                   'Coats':  [10, 10, 42, 17, 37]}, 
                    index=['XS', 'S', 'M', 'L', 'XL'])
ax = plt.gca()

df.plot.bar(ax=ax)
plt.title('Rivers Clothing - Sold')
plt.xlabel('Sizes')
plt.ylabel('Sold')
plt.show()

Output

The buttons on the bottom left can be used to further manipulate the chart.

💡 Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'bar' option.

DataFrame Horizontal Bar

The pandas.DataFrame.plot.barh() method is a Horizontal Bar representing data with rectangular bars. The lengths (height) of these bars define the values they represent.

The syntax for this method is as follows:

DataFrame.plot.barh(x=None, y=None, **kwargs)

Parameter	Description
`x`	This parameter determines the coordinates for the x-axis. Default is the index.
`y`	This parameter determines the coordinates for the y axis. Default is columns.
`color`	This parameter can be a string, an array, or a dictionary to signify color(s). – A single color can be specified by name, RGB or RGBA – A color sequence specified by name, RGB, or RGBA. – A dict of the form (col name/color) so each column is colored differently.
`**kwargs`	Additional keywords are outlined above in the `plot()` method.

Rivers Clothing would like a Horizontal Bar chart of its sales based on sizes sold over the past six (6) months.

custom_colors = {'Tops': '#8A2BE2', 'Pants': '#6495ED', 'Coats': '#E6E6FA'}

df = pd.DataFrame({'Tops':   [40, 12, 10, 26, 36],
                   'Pants':  [19, 8, 30, 21, 38],
                   'Coats':  [10, 10, 42, 17, 37]}, 
                   index=['XS', 'S', 'M', 'L', 'XL'])
ax = plt.gca()
df.plot.barh(color=custom_colors, ax=ax)
plt.title('Rivers Clothing - Sold')
plt.xlabel('Sizes')
plt.ylabel('Sold')
plt.show()

Line [1] creates a list of color selections for the three (3) bars and saves them to custom_colors.
Line [2] Gets the Current Access (gca()) and saves it to ax.
Line [3] creates the Horizontal Bar chart using custom_colors and the ax variable created above.
Line [4-6] sets the title and labels.
Line [7] outputs the Horizontal Bar chart on-screen.

Output

The buttons on the bottom left can be used to further manipulate the chart.

💡 Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'barh' option.

DataFrame Plot Box

The dataframe.plot.box() method creates a Box-and-Whisker plot from the DataFrame column(s). In short, this type of plot encapsulates the minimum, first quarter, median, third quarter, and maximum values of a dataset.

For a detailed definition of a Box plot, click here.

The syntax for this method is as follows:

DataFrame.plot.box(by=None, **kwargs)

Parameter	Description
`by`	This parameter is a string and signifies the column to group the DataFrame.
`**kwargs`	The keyword arguments for the method

For this example, Rivers Clothing requires a Box plot. This documents how its stock is performing on the Stock Exchange. The stock prices are reviewed twice a day for three (3) days in January (1^st, 15^th, and 30^th).

stock_dates  = ['Jan-01', 'Jan-01', 'Jan-15', 'Jan-15', 'Jan-30', 'Jan-30']
stock_prices = [3.34, 1.99, 2.25, 4.57, 5.74, 3.65]
ax = plt.gca()

df = pd.DataFrame({'Stock Date':  stock_dates, 'Stock Price': stock_prices})
boxplot = df.boxplot(column=['Stock Price'], by='Stock Date', grid=True, rot=30, fontsize=10, ax=ax)
plt.show()

Line [1] creates a list of dates and saves them to stock_dates.
Line [2] Gets the current access (gca()) and saves it to ax.
Line [3] creates a list of stock prices and saves to stock_prices.
Line [4] creates a DataFrame from the variables saved above.
Line [5] does the following:
- Creates the Box chart based on the Stock Prices and Dates.
- Displays the grid lines on the chart.
- Rotates the date labels at the chart bottom by 30 degrees.
- Sets the font size to 10.
- Sets the ax created above.
Line [6] outputs the Box chart on-screen.

The buttons on the bottom left can be used to further manipulate the chart.

💡 Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'box' option.

DataFrame Plot Density

The dataframe.plot.density() method generates Kernel Density Estimate (KDE) plots using Gaussian kernels.

Direct Quote from Wikipedia:

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample.
from Wikipedia

The syntax for this method is as follows:

DataFrame.plot.density(bw_method=None, ind=None, **kwargs)

Parameter	Description
`bw_method`	This parameter calculates the bandwidth. This parameter can be: `'scott'`, `'silverman'`, `scalar`, or callable. Click here for details.
`ind`	This parameter is the evaluation point for a PDF. If empty, 100 equally spaced points are assumed.
`**kwargs`	The keyword arguments for this method are outlined in the plot method.

For this example, a KDE chart plots the number of students who attended Grades 10 and 11 at Simms High School over the previous ten (10) years.

df = pd.DataFrame({
'Grade-10':  [12, 11, 13, 14, 17, 11, 18, 29, 47, 76],
'Grade-11':  [11, 16, 15, 28, 35, 36, 61, 68, 59, 67]})
ax = plt.gca()

df.plot.kde(title="KDE - Students - Previous 10 Years", ax=ax)
plot.show()

Line [1] creates a DataFrame from a dictionary of lists and saves it to df.
Line [2] Gets the current access (gca()) and saves it to ax.
Line [3] creates a KDE chart and sets the chart title.
Line [4] outputs the KDE chart on-screen.

Output

The buttons on the bottom left can further manipulate the chart.

💡 Note: Another way to create this chart is with the plot() method and the kind parameter set to the 'kde' option.

Further Learning Resources

This is Part 19 of the DataFrame method series.

Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().

Also, have a look at the Pandas DataFrame methods cheat sheet!