Pandas pivot_table() - A Simple Guide with Video - Be on the Right Side of Change

In this tutorial, we will learn to use the Pandas function pivot_table(). This function is used to create a pivot table as a data frame. It allows for lots of customization possibilities to provide informative insights into our data.

Syntax Table

Here are the parameters from the official documentation:

Parameter	Type	Description
`data`	DataFrame
`values`	column to aggregate, optional
`index`	column, Grouper, array, or list of the previous	– If array, must be the same length as `data`. It’s used the same way as column values. – The list can contain any of the other types but no list. – Keys to group by on the pivot table index.
`columns`	column, Grouper, array, or list of the previous	– If array, must be the same length as the data. It’s used the same way as column values. – The list can contain any of the other types but no list. – Keys to group by on the pivot table column.
`aggfunc`	function, list of functions, dict, default `numpy.mean`	– If list of functions, the resulting pivot table will have hierarchical columns whose top level are the function names. Those are inferred from the function objects themselves. – If dictionary, the key is the column to aggregate and the *value* is function or list of functions.
`fill_value`	scalar, default `None`	Value to replace missing values with in the resulting pivot table, after aggregation.
`margins`	bool, default `False`	Add all row/columns. Example: subtotal/grand totals.
`dropna`	bool, default `True`	Do not include columns whose entries are all NaN.
`margins_name`	str, default `'All'`	Name of the row/column that will contain the totals when margins is `True`.
`observed`	bool, default `False`	This only applies if any of the groupers are *Categoricals*. – If `True`: only show observed values for categorical groupers. – If `False`: show all values for categorical groupers.
`sort`	bool, default `True`	Sort the result?

Returns	Type	Description
	DataFrame	An Excel-style pivot table.

Basic Example

To get started, we will have a look at an introductory example. We create a data frame that we will be using throughout the rest of the article:

import pandas as pd
import numpy as np

df = pd.DataFrame(
           {'Team': ['Reds', 'Reds', 'Reds', 'Reds', 'Reds', 'Reds',
                     'Reds', 'Blues', 'Blues', 'Blues', 'Blues',
                     'Blues', 'Blues', 'Blues'], 
            'Position': ['Pitcher', 'Pitcher', 'Pitcher', 'Center Field',
                         'Center Field', 'Center Field', 'Center Field',
                         'Pitcher', 'Pitcher', 'Pitcher', 'Center Field',
                         'Center Field', 'Center Field', 'Center Field'],
            'Throw Arm': ['Left', 'Right', 'Right', 'Left', 'Left',
                          'Right', 'Left', 'Left', 'Right', 'Right',
                          'Right', 'Left', 'Left', 'Left',],
            'Throwing Speed': [80, 75, 69, 71, 68, 59, 80,
                               81, 74, 68, 70, 79, 76, 73],
            'Running Speed': [19, 20, 18, 17, 19, 20, 16,
                              16, 19, 20, 19, 20, 21, 17]})

print(df)

The resulting DataFrame:

	Team	Position	Throw Arm	Throwing Speed	Running Speed
0	Reds	Pitcher	Left	80	19
1	Reds	Pitcher	Right	75	20
2	Reds	Pitcher	Right	69	18
3	Reds	Center Field	Left	71	17
4	Reds	Center Field	Left	68	19
5	Reds	Center Field	Right	59	20
6	Reds	Center Field	Left	80	16
7	Blues	Pitcher	Left	81	16
8	Blues	Pitcher	Right	74	19
9	Blues	Pitcher	Right	68	20
10	Blues	Center Field	Right	70	19
11	Blues	Center Field	Left	79	20
12	Blues	Center Field	Left	76	21
13	Blues	Center Field	Left	73	17

First, we import the necessary libraries, namely Pandas and Numpy. Then, we create a Pandas data frame and assign it to a variable called “df“. Finally, we output the data frame.

The data frame contains information about players of two Baseball teams. For each player, we get the player’s team, position, throw arm, throw speed, and running speed.

Now, we apply the pivot_table() function:

pd.pivot_table(df, values='Throwing Speed', 
               index='Team', aggfunc=np.mean)

Output:

Inside the function, we pass in the data frame as the first argument.

The second argument is the “values” parameter. This parameter expects the column or columns that we want to aggregate, and we assign it the column “Throwing Speed”.

The next parameter is the “index” parameter which we assign the column to be the index of the new data frame we create.

The final parameter here is the “aggfunc” parameter. This parameter expects the function or functions that we want to apply on the aggregated column. We set it equal to “np.mean” which is a Numpy function for calculating the mean value. We would not have to apply the “aggfunc” parameter to calculate the mean since the default value for this parameter is “np.mean“. But we do it anyway to make it easier to understand what is happening here.

The output shows a new data frame with the “Team” column as the index. For each team, we get the average throwing speed because the “Throwing Speed” column is the aggregated column and we assigned the “aggfunc” parameter the np.mean() function.

Apply Multiple Indexes

In the previous example, we saw how to calculate the average throwing speed of each team.

Now, we will go more into detail and calculate the average throwing speed for each position of each team. We achieve that by assigning the “index” parameter multiple columns:

pd.pivot_table(df, values='Throwing Speed', 
               index=['Team', 'Position'], aggfunc=np.mean)

Output:

We keep the rest of the function as it was before, but this time, we assign the “index” parameter a list of columns, namely the “Team” and the “Position” column.

This way, we created a multi-index data frame.

Add Columns to Group by

When we have a look at the initial data frame, we observe that we haven’t used the “Throw Arm” column in any way yet.

We could add it to the “index” parameter which creates this new data frame:

pd.pivot_table(df, values='Throwing Speed', 
               index=['Team', 'Position', 'Throw Arm'], aggfunc=np.mean)

Here’s the output table:

This way, we get the average throwing speed per throw arm per position per team. However, it looks a bit confusing here as there are too many index columns.

An alternative way to apply the “Throw Arm” column within our pivot_table() function, is to assign it to the “columns” parameter:

pd.pivot_table(df, values='Throwing Speed', index=['Team', 'Position'], 
               columns = ['Throw Arm'], aggfunc=np.mean)

Here’s the output:

When we compare the two pivot tables here, we see that they contain the same information. The first column, as well as the second column, both provide information about the average throwing speed for each throw arm of each position of each team. Nevertheless, the second approach is way clearer as we have half as many rows here.

Thus, combining the “index” parameter with the “columns” parameter allows us to crave lots of information out of our data in a comprehensible way.

Other Aggregate Functions

By now, we only applied the np.mean() function to our pivot table. Of course, there are multiple other functions to make use of. One example is the np.max() function:

pd.pivot_table(df, values='Throwing Speed', index=['Team', 'Position'], 
               columns = ['Throw Arm'], aggfunc=np.max)

The output:

So, here we calculate the maximum value of the throwing speed of each throwing arm, position, and team. For example, the maximum value for throwing speed of the left-handed, Center Field positioned Blue’s Team players is 79.

This is just one example. There are all sorts of other functions to apply here.

Apply Multiple Functions

The “aggfunc” parameter allows us to perform multiple functions at once. So, the resulting data frame provides us with even more information about our data.

But to achieve that, we first have to assign the “values” parameter multiple columns to execute different functions on different columns:

pd.pivot_table(df, values=['Throwing Speed', 'Running Speed'], 
               index=['Team', 'Position'], columns = ['Throw Arm'], 
               aggfunc={'Throwing Speed': np.mean, 
                        'Running Speed': np.median})

Output:

We now assign the “values” parameter the “Throwing Speed” column, as well as the “Running Speed” column. The “aggfunc” parameter gets assigned a dictionary. The dictionary’s keys are the column names to perform the function on and the dictionary’s values are the functions. We calculate the mean for the “Throwing Speed” and the median for the “Running Speed”.

We can extend that even further by deploying multiple functions to a column. We do that by applying a list of functions within the dictionary:

pd.pivot_table(df, values=['Throwing Speed', 'Running Speed'],
               index=['Team', 'Position'], columns = ['Throw Arm'],
               aggfunc={'Throwing Speed': np.mean,
                        'Running Speed': [np.min, np.max, np.median]})

Output:

We assign the “Running Speed” column in our “aggfunc” parameter a list of functions, namely the np.min() function, the np.max() function, and the np.median() function. The outputted data frame shows the name of each function at the top.

Summary

All in all, the pivot_table() function provides us with lots of opportunities to create a pivot table that puts out a lot of information about our data. We learned how to use the function’s various parameters to analyze our data in several ways. We saw how to create a pivot table, how to do so with multiple indexes, how to apply key columns to group by, and how to add multiple functions for aggregation.

For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter Blog page.

Happy Coding!