Python Pandas melt()

Syntax

pandas.melt(frame, 
            id_vars=None,
            value_vars=None, 
            var_name=None, 
            value_name='value', 
            col_level=None, 
            ignore_index=True)

Return Value

The return value for the melt() function is an unpivoted DataFrame.

Background

Direct quote from the Pandas Documentation website:

“This function massages a DataFrame into a format where one or more columns are identifier variables (id_vars). While all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, 'variable' and 'value'!”

In essence, if the DataFrame contains numerous columns with vast amounts of data, you can restrict columns to a specified amount. Doing this will change the viewport from landscape to portrait: a more manageable solution.

This article delves into each parameter for this function separately.

Getting Started

Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

The data structure used in this article is a dictionary of lists made up of Keys (column names) and associated Values (list).

Required Starter Code:

import pandas as pd

staff = {'FName':  ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID': [100, 101, 102, 103, 104], 
         'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age': [19, 23, 21, 27, 36]}

The “frame” Parameter

The melt() frame parameter is a DataFrame. The parameter can be one of the following data types or another data type that converts to a DataFrame:

  • CSV
  • dictionary of lists (used in this article)
  • dictionary of tuples, and more

If this parameter is empty, the following output will display:

df = pd.DataFrame()
print(df)

Output:

Empty DataFrame
Columns: []
Index: []

If the DataFrame contains the parameter staff, the output will be similar to the table below. Formatting will vary and depend on the IDE used to run the code.

df = pd.DataFrame(staff)
print(df)

Output:

 FName EIDJobAge
0Clare100Designer I19
1Micah101Data Scientist23
2Ben102Developer21
3Mac103Designer II27
4Emma104Manager36

❗ Note: For all examples in this article, use the staff DataFrame.

The “id_vars” Parameter

The melt() id_vars parameter is not required and can be one of the following data types:

These data types pass the column names. These are used as identifier variable(s) and must exist in the DataFrame.  This parameter may contain single or multiple column names and must be unique values.

df_id_vars = pd.melt(df, id_vars=['Job'])
print(df_id_vars)
  • Line [1] passes a list with one element to the id_vars parameter.

Output:

In this example, the id_vars parameter is a list with one element, Job.  The Job element column displays to the right of the index column.

Note: These columns display to the right of the default index column in the same order as they appear in the id_vars list.

Looking at the original data structure, you will see that the original column position of Job is three.

staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID':   [100, 101, 102, 103, 104], 
         'Job':   ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age':   [19, 23, 21, 27, 36]}

The output displays the Job for each staff member three times. Once for each remaining column:

  • FName
  • EID
  • Age
 Jobvariablevalue
0Designer IFNameClare
1Data ScientistFNameMicah
2DeveloperFNameBen
3Designer IIFNameMac
4ManagerFNameEmma
5Designer IEID100
6Data ScientistEID101
7DeveloperEID102
8Designer IIEID103
9ManagerEID104
10Designer IAge19
11Data ScientistAge23
12DeveloperAge21
13Designer IIAge27
14ManagerAge36

The value_vars Parameter

The melt() value_vars parameter is not required and maybe one of the following data types:

  • tuple
  • list
  • ndarray

This parameter lists the column(s) to unpivot. If empty, all columns will display.

df_val_vars = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'])
print(df_val_vars)

Output:

In this example, the Job list remains set as id_vars (see above).  

The Job for each staff member is displayed twice. Once for each column listed in the value_vars parameter:

  • EID
  • Age
 Jobvariablevalue
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The var_name Parameter

The melt() var_name is not required and scalar. This name is the name used for the variable column heading. If None, frame.columns.name or the word variable will display.

df_var_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], var_name='EID/Age')
print(df_var_name)

Output:

After running this code, the var_name column heading changes to EID/Age.

 JobEID/Agevalue
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The value_name Parameter

The melt() value_name parameter is not required and scalar. This name is the name to use for the value column heading. If None, the word value is used.

df_val_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], 
                      var_name='EID/Age', value_name='Data')
print(df_val_name)

Output:

After running this code, the value_name column changes to Data.

 JobEID/AgeData
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The col_level Parameter

The melt() col_level parameter is not required and can be an integer or string data type. If columns are multi-index, use this level to melt.

df_col_level = df.melt(col_level=0)
print (df_col_level)

Output:

In this example, each column name is displayed consecutively with relevant data in the order they appear in the original data structure (see starter code above).

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
5EID            100
6EID            101
7EID            102
8EID            103
9EID            104
10Job     Designer I
11Job     Data Scientist
12Job     Developer
13Job     Designer II
14Job     Manager
15Age             19
16Age             23
17Age             21
18Age             27
19Age             36


The ignore_index Parameter

The ignore_index parameter is not required and can be True or False (Boolean).

df_ig_index = pd.melt(df, ignore_index=True)
print(df_ig_index)

Output:

If True, the original index column does not display. Instead, the output is as follows:

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
5EID100
6EID101
7EID102
8EID103
9EID104
10JobDesigner I
11JobData Scientist
12JobDeveloper
13JobDesigner II
14JobManager
15Age19
16Age23
17Age21
18Age27
19Age36
df_ig_index = pd.melt(df, ignore_index=False)
print(df_ig_index)

Output:

If False, the original index does display (retained).

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
0EID100
1EID101
2EID102
3EID103
4EID104
0JobDesigner I
1JobData Scientist
2JobDeveloper
3JobDesigner II
4JobManager
0Age19
1Age23
2Age21
3Age27
4Age36

Sources: