Python Pandas pivot()

Rate this post

Syntax

pandas.pivot(data, index=None, columns=None, values=None)

Return Value: The return value for the pivot() function is a reshaped DataFrame organized by index/column values.


Background

Direct quote from the Pandas Documentation website:

This function does not support data aggregation. If there are multiple values, they will result in a Multi-Index in the columns.

This article delves into each parameter for this function separately.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.


$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

staff = {'FName':  ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID':    [100, 101, 102, 103, 104], 
         'Job':    ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age':    [19, 23, 21, 27, 36],
         'Salary': [87343, 94123, 96543, 65232, 102375]}

💡 Note: The data structure used in this article is a Dictionary of Lists made up of Keys (column names) and associated Values (list).


The “data” Parameter

The pivot() function data parameter is a DataFrame. This parameter can be one of the following data types or another data type that converts to a DataFrame:

If the DataFrame is empty, the following output will display:

df = pd.DataFrame()
print(df)

Output

Empty DataFrame
Columns: []
Index: []

If the DataFrame contains staff, the output will be similar to the table below.

df = pd.DataFrame(staff)
print(df)

💡 Note: Formatting will vary depending on the IDE used to run the code.

Output

 FName EIDJobAgeSalary
0Clare100Designer I1987343
1Micah101Data Scientist2394123
2Ben102Developer2196543
3Mac103Designer II2765232
4Emma104Manager36102375

The “index” Parameter

The pivot() method’s index parameter is not required. If used, these column(s) create a new index. If empty, the default index numbering system will display. The parameter can be one of the following data types:

  • string
  • object
  • list of strings
  • list of index names

💡 Note: All data assigned to the index parameter must contain unique values. If not, a ValueError occurs.

In this example, the index parameter is the EID and Job columns. The column parameter is required. If empty, the entire DataFrame will display. If missing, a TypeError will occur.

df_index = df.pivot(index=['EID', 'Job'], columns=[])
print(df_index)

Output

For this code example, the index parameter contains a list of column names. They display on the left in the same order as entered in the list: not the original data structure order.

staff = {'FName':  ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID':    [100, 101, 102, 103, 104], 
         'Job':    ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age':    [19, 23, 21, 27, 36],
         'Salary': [87343, 94123, 96543, 65232, 102375]}

The remaining columns with the relevant data will display (FName, Age, Salary) on the right.

EIDJobFName AgeSalary
100Designer IClare1987343
101Data ScientistMicah2394123
102DeveloperBen2196543
103Designer IIMac2765232
104ManagerEmma36102375

The “columns” Parameter

The Pandas pivot() method’s columns parameter is required. The parameter can be one of the following data types:

  • string
  • object
  • list of strings
  • list of index names

If empty (as shown above), the entire DataFrame displays. If missing, a TypeError occurs.

df_cols = df.pivot(index=['EID', 'Job', 'Salary'], columns=['FName'])
print(df_cols)

Output

In this example, the index columns will display on the left side. The order is the same as that listed in the above parameter.

The list passed to the columns parameter (FNames) shows the text as a sub-heading. The remaining sub-headings are the values from that column.

The Age column is the only column omitted from both parameters. This column becomes the main table heading.

The Age data from the original data structure will display as floats.


The “values” Parameter

This function’s value parameter is not required. The parameter can be one of the following data types:

  • string
  • object
  • list of the previous

This parameter uses column(s) for populating DataFrame values. If this parameter is empty, all remaining columns will display.

In this example, the Salary data from the original data structure will display as floats.

df_values = df.pivot(index=['EID', 'Job', 'Age'], columns=['FName'], values=['Salary'])
print(df_values)

The DataFrame