Python Pandas melt()

Rate this post

Syntax

pandas.melt(frame, 
            id_vars=None,
            value_vars=None, 
            var_name=None, 
            value_name='value', 
            col_level=None, 
            ignore_index=True)

Return Value

The return value for the melt() function is an unpivoted DataFrame.


Background

Direct quote from the Pandas Documentation website:

“This function massages a DataFrame into a format where one or more columns are identifier variables (id_vars). While all other columns are considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, 'variable' and 'value'!”

If the DataFrame contains numerous columns with vast amounts of data, you can restrict columns to a specified amount. Doing this will change the viewport from landscape to portrait: a more manageable solution.

This article delves into each parameter for this function separately.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.


$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

staff = {'FName':  ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID': [100, 101, 102, 103, 104], 
         'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age': [19, 23, 21, 27, 36]}

The “frame” Parameter

The melt() frame parameter is a DataFrame. The parameter can be one of the following data types or another data type that converts to a DataFrame:

  • CSV
  • dictionary of lists (used in this article)
  • dictionary of tuples, and more

If this parameter is empty, the following output will display:

df = pd.DataFrame()
print(df)

Output

Empty DataFrame
Columns: []
Index: []

If the DataFrame contains the parameter staff, the output will be similar to the table below.

💡 Note: Formatting will vary depending on the IDE used to run the code.

df = pd.DataFrame(staff)
print(df)

Output

 FName EIDJobAge
0Clare100Designer I19
1Micah101Data Scientist23
2Ben102Developer21
3Mac103Designer II27
4Emma104Manager36

The “id_vars” Parameter

The melt() id_vars parameter is not required and can be one of the following data types:

These data types pass the column names. These are used as identifier variable(s) and must exist in the DataFrame. This parameter may contain single or multiple column names and must be unique values.

df_id_vars = pd.melt(df, id_vars=['Job'])
print(df_id_vars)
  • Line [1] passes a list with one element to the id_vars parameter.
  • Line [2] outputs the contents to the terminal.

Output

In this example, the id_vars parameter is a list with one element, Job. The Job element column displays to the right of the index column.

💡Note: These columns show to the right of the default index column in the same order as they appear in the id_vars list.

Looking at the original data structure, you will see that the original column position of Job is three.

staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID':   [100, 101, 102, 103, 104], 
         'Job':   ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age':   [19, 23, 21, 27, 36]}

The output displays the Job for each staff member three times. Once for each remaining column:

  • FName
  • EID
  • Age
 Jobvariablevalue
0Designer IFNameClare
1Data ScientistFNameMicah
2DeveloperFNameBen
3Designer IIFNameMac
4ManagerFNameEmma
5Designer IEID100
6Data ScientistEID101
7DeveloperEID102
8Designer IIEID103
9ManagerEID104
10Designer IAge19
11Data ScientistAge23
12DeveloperAge21
13Designer IIAge27
14ManagerAge36

The value_vars Parameter

The melt() value_vars parameter is not required and maybe one of the following data types:

  • tuple
  • list
  • ndarray

This parameter lists the column(s) to unpivot. If empty, all columns will display.

df_val_vars = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'])
print(df_val_vars)

Output

In this example, the Job list remains set as id_vars (see above).  

The Job for each staff member is displayed twice. Once for each column listed in the value_vars parameter:

  • EID
  • Age
 Jobvariablevalue
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The var_name Parameter

The melt() var_name is not required and scalar. This name is the name used for the variable column heading. If None, frame.columns.name or the word variable will display.

df_var_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], var_name='EID/Age')
print(df_var_name)

Output

After running this code, the var_name column heading changes to EID/Age.

 JobEID/Agevalue
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The value_name Parameter

The melt() value_name parameter is not required and scalar. This name is the name to use for the value column heading. If None the word value is used.

df_val_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], 
                      var_name='EID/Age', value_name='Data')
print(df_val_name)

Output

After running this code, the value_name column changes to Data.

 JobEID/AgeData
0Designer IEID100
1Data ScientistEID101
2DeveloperEID102
3Designer IIEID103
4ManagerEID104
5Designer IAge19
6Data ScientistAge23
7DeveloperAge21
8Designer IIAge27
9ManagerAge36

The col_level Parameter

The melt() col_level parameter is not required and can be an integer or string data type. If columns are multi-index, use this level to melt.

df_col_level = df.melt(col_level=0)
print (df_col_level)

Output

In this example, each column name is displayed consecutively with relevant data in the order they appear in the original data structure (see starter code above).

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
5EID            100
6EID            101
7EID            102
8EID            103
9EID            104
10Job     Designer I
11Job     Data Scientist
12Job     Developer
13Job     Designer II
14Job     Manager
15Age             19
16Age             23
17Age             21
18Age             27
19Age             36

The ignore_index Parameter

The ignore_index parameter is not required and can be True or False (Boolean).

df_ig_index = pd.melt(df, ignore_index=True)
print(df_ig_index)

Output

If True, the original index column does not display. Instead, the output is as follows:

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
5EID100
6EID101
7EID102
8EID103
9EID104
10JobDesigner I
11JobData Scientist
12JobDeveloper
13JobDesigner II
14JobManager
15Age19
16Age23
17Age21
18Age27
19Age36
df_ig_index = pd.melt(df, ignore_index=False)
print(df_ig_index)

Output

If False, the original index does display (retained).

 variablevalue
0FName          Clare
1FName          Micah
2FName          Ben
3FName          Mac
4FName          Emma
0EID100
1EID101
2EID102
3EID103
4EID104
0JobDesigner I
1JobData Scientist
2JobDeveloper
3JobDesigner II
4JobManager
0Age19
1Age23
2Age21
3Age27
4Age36

Sources: