# boxplot() – The Pandas.plotting Module

Rate this post

A box plot is a method used in statistics to graphically show a group, or groups, of numerical data with their quartiles identified. A box plot is often also called a box-and-whisker plot, as the plot may have lines extending from the box to show data outside the upper and lower quartiles.

In this article, we’ll quickly introduce you to the box plot and then show you how to use the function `boxplot()` from within the Pandas plotting module to create a plot from a `.csv` file.

## What is a Boxplot?

A boxplot is a standard method of showing a dataset, highlighting five of the most important statistical measures. These are the minimum, maximum, median, and the first and third percentiles. The boxplot will also identify any data lying outside the minimum and maximum percentiles, known as outliers.

You may be asking how you get figures that lie outside the maximum and minimum percentiles? Well, that’s where an understanding of the interquartile range comes in. The interquartile range, also known as the middle-50%, is a statistical measure of data dispersion. If you take the first quartile away from the third quartile, you get the interquartile range.

You use this figure to set the minimum and maximum points for the range of data. Multiply 1.5 by the interquartile range and subtract it from the first quartile to calculate the minimum figure. Multiple 1.5 by the interquartile range and add it to the third quartile to calculate the maximum figure. Any data that lie outside these minimum and maximum points are treated as outliers. Here’s a sketch to show you the components of the box plot.

## Using The Pandas Plotting Module’s boxplot() Function

The Pandas plotting module has a library of statistical functions, one of which is the `boxplot()` function. `boxplot()` simplifies the analysis and graphical representation of the columns in a dataset.

We’ll be using the Palmer Archipelago (Antarctica) penguin dataset. It’s an ideal dataset for our purposes as it isn’t unwieldy yet nicely allows us to demonstrate the workings of this function.

The dataset looks at three different penguin species on three different Antarctic islands and captures the sex of each penguin, the length of its flippers, its weight, and the length and depth of its culmen (the upper ridge of its beak). We’ll run a box plot on the flipper length column and segregate the results by the penguin type and the island it came from.

## The boxplot() Function Syntax

You will find information on the `boxplot()` function here. The syntax is straightforward, accepting the following parameters.

`pandas.plotting.boxplot(data, column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, **kwargs)`

## Using boxplot()

First, we need to import Pandas and create a data frame from the .CSV file saved to our computer.

We will also use `matplotlib.pyplot` to plot the graph, so let’s do that code.

```import pandas as pd
import matplotlib.pyplot as plt

```

Now we simply need to call the `boxplot()` function with the various parameters inserted.

```import pandas as pd
import matplotlib.pyplot as plt

pd.plotting.boxplot(df, column=['flipper_length_mm'], by=['island', 'species'],
grid=False,  figsize=(25,18),  fontsize=15)

plt.show()
```

We’ll run that and here’s the result.

This plot shows inter-island differences in flipper length between the Adelie penguins and the difference between the Chinstrap and Adelie penguins on Dream island. Note the outliers on three of the five box plots.

As a final note, there’s a shorthand method of calling the boxplot syntax, which looks like the following. Both will give the same return. I used the longhand method as it aligns with the syntax you’ll see on the `pandas.plotting` module page.

```df.boxplot(column=['flipper_length_mm'], by=['island', 'species'],
grid=False, figsize=(25,18), fontsize=15)```

## In Summary

We talked about the box plot as a method used in statistics to graphically show a group, or groups, of numerical data with their quartiles identified. You may also hear the box plot called a box-and-whisker plot.

Before introducing the Pandas Plotting module function, `boxplot()`, we gave a quick overview of box plots and described their characteristics. Then we wrote some code using `boxplot()` and `matplotlib.pyplot` to interrogate the penguin dataset and produced a bootstrap plot of the flipper length column, allowing analysis.