Syntax
pandas.pivot(data, index=None, columns=None, values=None)
Return Value: The return value for the pivot()
function is a reshaped DataFrame
organized by index/column values.
Background
Direct quote from the Pandas Documentation website:
This function does not support data aggregation. If there are multiple values, they will result in a Multi-Index in the columns.
This article delves into each parameter for this function separately.
Preparation
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 'EID': [100, 101, 102, 103, 104], 'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'], 'Age': [19, 23, 21, 27, 36], 'Salary': [87343, 94123, 96543, 65232, 102375]}
π‘ Note: The data structure used in this article is a Dictionary of Lists made up of Keys (column names) and associated Values (list).
The “data” Parameter
The pivot()
function data
parameter is a DataFrame
. This parameter can be one of the following data types or another data type that converts to a DataFrame:
- CSV
- dictionary of lists (used in this article)
- dictionary of tuples, and more
If the DataFrame
is empty, the following output will display:
df = pd.DataFrame() print(df)
Output
Empty DataFrame Columns: [] Index: []
If the DataFrame contains staff
, the output will be similar to the table below.
df = pd.DataFrame(staff) print(df)
π‘ Note: Formatting will vary depending on the IDE used to run the code.
Output
FName | EID | Job | Age | Salary | |
0 | Clare | 100 | Designer I | 19 | 87343 |
1 | Micah | 101 | Data Scientist | 23 | 94123 |
2 | Ben | 102 | Developer | 21 | 96543 |
3 | Mac | 103 | Designer II | 27 | 65232 |
4 | Emma | 104 | Manager | 36 | 102375 |
The “index” Parameter
The pivot()
method’s index
parameter is not required. If used, these column(s) create a new index. If empty, the default index numbering system will display. The parameter can be one of the following data types:
- string
- object
- list of strings
- list of index names
π‘ Note: All data assigned to the index parameter must contain unique values. If not, a ValueError occurs.
In this example, the index
parameter is the EID and Job columns. The column parameter is required. If empty, the entire DataFrame will display. If missing, a TypeError will occur.
df_index = df.pivot(index=['EID', 'Job'], columns=[]) print(df_index)
Output
For this code example, the index parameter contains a list of column names. They display on the left in the same order as entered in the list: not the original data structure order.
staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 'EID': [100, 101, 102, 103, 104], 'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'], 'Age': [19, 23, 21, 27, 36], 'Salary': [87343, 94123, 96543, 65232, 102375]}
The remaining columns with the relevant data will display (FName
, Age
, Salary
) on the right.
EID | Job | FName | Age | Salary |
100 | Designer I | Clare | 19 | 87343 |
101 | Data Scientist | Micah | 23 | 94123 |
102 | Developer | Ben | 21 | 96543 |
103 | Designer II | Mac | 27 | 65232 |
104 | Manager | Emma | 36 | 102375 |
The “columns” Parameter
The Pandas pivot()
method’s columns
parameter is required. The parameter can be one of the following data types:
- string
- object
- list of strings
- list of index names
If empty (as shown above), the entire DataFrame displays. If missing, a TypeError occurs.
df_cols = df.pivot(index=['EID', 'Job', 'Salary'], columns=['FName']) print(df_cols)
Output
In this example, the index
columns will display on the left side. The order is the same as that listed in the above parameter.
The list passed to the columns
parameter (FNames
) shows the text as a sub-heading. The remaining sub-headings are the values from that column.
The Age
column is the only column omitted from both parameters. This column becomes the main table heading.
The Age
data from the original data structure will display as floats.
The “values” Parameter
This function’s value
parameter is not required. The parameter can be one of the following data types:
- string
- object
- list of the previous
This parameter uses column(s) for populating DataFrame values. If this parameter is empty, all remaining columns will display.
In this example, the Salary data from the original data structure will display as floats.
df_values = df.pivot(index=['EID', 'Job', 'Age'], columns=['FName'], values=['Salary']) print(df_values)
The DataFrame