Preparation
Before any data manipulation can occur, three (3) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Matplotlib library displays a visual graph of a plotted dataset.
- The Scipy library allows users to manipulate and visualize the data.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install matplotlib
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install scipy
Hit the <Enter>
key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import matplotlib.pyplot as plt import scipy
DataFrame Plot
The plot()
method creates visual graphs based on a dataset of a DataFrame or Series.
The syntax for this method is as follows:
DataFrame.plot(*args, **kwargs)
Parameter | Description |
---|---|
data | This parameter is a DataFrame/Series dataset. |
x | This parameter is a label/position (for a DataFrame only). |
kind | This parameter is a string and indicates the type of plot to create:'line' : default is this option'density' : same as ‘KDE’βbarβ : vertical bar chart'area' : area plotβbarhβ : horizontal bar chart'pie' : pie plotβhistβ : histogram'scatter' : scatter plot (DataFrame)βboxβ : boxplot'hexbin' : hexbin plot (DataFrame)βkdeβ : Kernel Density plot |
ax | This parameter is the Matplotlib axis object. |
subplots | This parameter makes subplots for each column separately. |
sharex | If subplots, share x-axis and set some x-axis labels to invisible. |
sharey | If subplots, share the y-axis and set some y-axis labels to invisible. |
layout | A tuple that determines the row/column layout for subplots. |
figsize | This parameter sets the size (width and height) of the figure. |
use_index | Use the index as ticks for the x-axis. |
title | The heading to use for the plot (graph). |
grid | These are the axis grid lines. |
legend | Display legend on the axis subplots. Displays by default (True ). |
style | The line style per column (matplotlib). |
logx | Use log/symlog scaling on the x-axis. |
logy | Use log/symlog scaling on the y-axis. |
loglog | Use log/symlog scaling on both the x-axis and y-axis. |
xticks | The value to use for xticks. |
yticks | The value to use for yticks. |
xlim | Set the x limits of the current axis. |
ylim | Set the y limits of the current axis. |
xlabel | Name for the x-axis. |
ylabel | Name for the y-axis. |
rot | The rotation for ticks (xticks vertical/yticks horizontal). |
fontsize | The size of the font to use for both xticks/yticks. |
colormap | This parameter is the color map to select specific colors. |
position | These are the alignments for the bar plot. |
table | If True, create a table using DataFrame data. This data will transpose to the matplotlib default layout. |
yerr | See plotting with Error Bars. |
xerr | See plotting with Error Bars. |
stacked | If set to True , create a stacked plot. |
sort_columns | This parameter sorts the column name(s) for plot ordering. |
secondary_y | This parameter determines if it plots on the secondary y-axis. |
mark_right | If set determines if using a secondary_y axis auto marks the column labels with right in the legend. |
include_bool | If set to True , Boolean values will be available to plot. |
backend | This parameter determines the backend to use instead of the option plotting.backend . |
**kwargs | This parameter is the option(s) passed to the matplotlib library. |
This example reads in the countries.csv
file and plots the Country, Population, and Area columns on a Line chart.
π‘ Note: Click here to download this file. Move it to the current working directory,
df = pd.read_csv('countries.csv') ax = plt.gca() df.plot(kind='line', x='Country', y='Population', title='Sample Countries', fontsize=8, ax=ax) df.plot(kind='line',x='Country', y='Area', ax=ax) plt.savefig('plot_line.png') plt.show()
- Line [1] reads in a comma-delimited CSV file and saves it to a DataFrame (
df
). - Line [2] gets the current axes (
gca()
) and saves it toax
. - Line [3] does the following:
- sets the kind parameter to a Line chart
- sets the columns to Country and Population
- sets the title and font size
- sets the
ax
variable created above
- Line [4] does the following:
- sets the kind parameter to a Line chart
- sets the columns to Country and Area
- sets the
ax
variable created above
- Line [5] saves the Line chart as an image file and places this file in the current working directory.
- Line [6] displays the Line chart on-screen.
π‘Β Note: The gca() method gets the current axes for the figure matching **kwargs, or creates a new one.
Output β On-Screen
The buttons on the bottom left can be used to further manipulate the chart.
π‘Β Note: Another way to create this chart is to use the plot.line()
method.
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.