Python Pandas melt() - Be on the Right Side of Change

Syntax

pandas.melt(frame, 
            id_vars=None,
            value_vars=None, 
            var_name=None, 
            value_name='value', 
            col_level=None, 
            ignore_index=True)

Return Value

The return value for the melt() function is an unpivoted DataFrame.

Background

Direct quote from the Pandas Documentation website:

“This function massages a DataFrame into a format where one or more columns are identifier variables (id_vars). While all other columns are considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, 'variable' and 'value'!”

If the DataFrame contains numerous columns with vast amounts of data, you can restrict columns to a specified amount. Doing this will change the viewport from landscape to portrait: a more manageable solution.

This article delves into each parameter for this function separately.

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

staff = {'FName':  ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID': [100, 101, 102, 103, 104], 
         'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age': [19, 23, 21, 27, 36]}

The “frame” Parameter

The melt() frame parameter is a DataFrame. The parameter can be one of the following data types or another data type that converts to a DataFrame:

CSV
dictionary of lists (used in this article)
dictionary of tuples, and more

If this parameter is empty, the following output will display:

df = pd.DataFrame()
print(df)

Output

Empty DataFrame
Columns: []
Index: []

If the DataFrame contains the parameter staff, the output will be similar to the table below.

💡 Note: Formatting will vary depending on the IDE used to run the code.

df = pd.DataFrame(staff)
print(df)

Output

	FName	EID	Job	Age
0	Clare	100	Designer I	19
1	Micah	101	Data Scientist	23
2	Ben	102	Developer	21
3	Mac	103	Designer II	27
4	Emma	104	Manager	36

The “id_vars” Parameter

The melt() id_vars parameter is not required and can be one of the following data types:

These data types pass the column names. These are used as identifier variable(s) and must exist in the DataFrame. This parameter may contain single or multiple column names and must be unique values.

df_id_vars = pd.melt(df, id_vars=['Job'])
print(df_id_vars)

Line [1] passes a list with one element to the id_vars parameter.
Line [2] outputs the contents to the terminal.

Output

In this example, the id_vars parameter is a list with one element, Job. The Job element column displays to the right of the index column.

💡Note: These columns show to the right of the default index column in the same order as they appear in the id_vars list.

Looking at the original data structure, you will see that the original column position of Job is three.

staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 
         'EID':   [100, 101, 102, 103, 104], 
         'Job':   ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'],
         'Age':   [19, 23, 21, 27, 36]}

The output displays the Job for each staff member three times. Once for each remaining column:

FName
EID
Age

	Job	variable	value
0	Designer I	FName	Clare
1	Data Scientist	FName	Micah
2	Developer	FName	Ben
3	Designer II	FName	Mac
4	Manager	FName	Emma
5	Designer I	EID	100
6	Data Scientist	EID	101
7	Developer	EID	102
8	Designer II	EID	103
9	Manager	EID	104
10	Designer I	Age	19
11	Data Scientist	Age	23
12	Developer	Age	21
13	Designer II	Age	27
14	Manager	Age	36

The value_vars Parameter

The melt() value_vars parameter is not required and maybe one of the following data types:

tuple
list
ndarray

This parameter lists the column(s) to unpivot. If empty, all columns will display.

df_val_vars = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'])
print(df_val_vars)

Output

In this example, the Job list remains set as id_vars (see above).

The Job for each staff member is displayed twice. Once for each column listed in the value_vars parameter:

	Job	variable	value
0	Designer I	EID	100
1	Data Scientist	EID	101
2	Developer	EID	102
3	Designer II	EID	103
4	Manager	EID	104
5	Designer I	Age	19
6	Data Scientist	Age	23
7	Developer	Age	21
8	Designer II	Age	27
9	Manager	Age	36

The var_name Parameter

The melt() var_name is not required and scalar. This name is the name used for the variable column heading. If None, frame.columns.name or the word variable will display.

df_var_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], var_name='EID/Age')
print(df_var_name)

Output

After running this code, the var_name column heading changes to EID/Age.

	Job	EID/Age	value
0	Designer I	EID	100
1	Data Scientist	EID	101
2	Developer	EID	102
3	Designer II	EID	103
4	Manager	EID	104
5	Designer I	Age	19
6	Data Scientist	Age	23
7	Developer	Age	21
8	Designer II	Age	27
9	Manager	Age	36

The value_name Parameter

The melt() value_name parameter is not required and scalar. This name is the name to use for the value column heading. If None the word value is used.

df_val_name = pd.melt(df, id_vars=['Job'], value_vars=['EID', 'Age'], 
                      var_name='EID/Age', value_name='Data')
print(df_val_name)

Output

After running this code, the value_name column changes to Data.

	Job	EID/Age	Data
0	Designer I	EID	100
1	Data Scientist	EID	101
2	Developer	EID	102
3	Designer II	EID	103
4	Manager	EID	104
5	Designer I	Age	19
6	Data Scientist	Age	23
7	Developer	Age	21
8	Designer II	Age	27
9	Manager	Age	36

The col_level Parameter

The melt() col_level parameter is not required and can be an integer or string data type. If columns are multi-index, use this level to melt.

df_col_level = df.melt(col_level=0)
print (df_col_level)

Output

In this example, each column name is displayed consecutively with relevant data in the order they appear in the original data structure (see starter code above).

	variable	value
0	FName	Clare
1	FName	Micah
2	FName	Ben
3	FName	Mac
4	FName	Emma
5	EID	100
6	EID	101
7	EID	102
8	EID	103
9	EID	104
10	Job	Designer I
11	Job	Data Scientist
12	Job	Developer
13	Job	Designer II
14	Job	Manager
15	Age	19
16	Age	23
17	Age	21
18	Age	27
19	Age	36

The ignore_index Parameter

The ignore_index parameter is not required and can be True or False (Boolean).

df_ig_index = pd.melt(df, ignore_index=True)
print(df_ig_index)

Output

If True, the original index column does not display. Instead, the output is as follows:

	variable	value
0	FName	Clare
1	FName	Micah
2	FName	Ben
3	FName	Mac
4	FName	Emma
5	EID	100
6	EID	101
7	EID	102
8	EID	103
9	EID	104
10	Job	Designer I
11	Job	Data Scientist
12	Job	Developer
13	Job	Designer II
14	Job	Manager
15	Age	19
16	Age	23
17	Age	21
18	Age	27
19	Age	36

df_ig_index = pd.melt(df, ignore_index=False)
print(df_ig_index)

Output

If False, the original index does display (retained).

	variable	value
0	FName	Clare
1	FName	Micah
2	FName	Ben
3	FName	Mac
4	FName	Emma
0	EID	100
1	EID	101
2	EID	102
3	EID	103
4	EID	104
0	Job	Designer I
1	Job	Data Scientist
2	Job	Developer
3	Job	Designer II
4	Job	Manager
0	Age	19
1	Age	23
2	Age	21
3	Age	27
4	Age	36

Sources:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html