In this tutorial, we will learn how to use the Pandas melt()
function which turns a wide data frame into a long one. The function unpivots selected parts of the data frame, so these columns get turned into rows.
Here are the parameters from the official documentation:
Parameter | Type | Description |
---|---|---|
id_vars | tuple, list, or ndarray, optional | Column(s) to use as identifier variables. |
value_vars | tuple, list, or ndarray, optional | Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars . |
var_name | scalar | Name to use for the variable column. If None it uses frame.columns.name or βvariableβ. |
value_name | scalar, default βvalueβ | Name to use for the value column. |
col_level | int or str, optional | If columns are a MultiIndex then use this level to melt. |
ignore_index | bool, default True | If True , original index is ignored. If False , the original index is retained. Index labels will be repeated as necessary. |
Return Value: The pd.melt()
function returns an unpivoted DataFrame.
Basic Example
We will start with an introductory example to get a basic understanding of the Pandas melt()
function:
import pandas as pd df = pd.DataFrame({'Student': ['Alice', 'Mary', 'Bob'], 'Major': ['Biology', 'CS', 'CS'], 'Age': [21, 20, 25]}) df
Student | Major | Age | |
0 | Alice | Biology | 21 |
1 | Mary | CS | 20 |
2 | Bob | CS | 25 |
First, we import the Pandas library. Then we create a Pandas data frame that contains information about students. Finally, we output the data frame. We see each student’s name, major, and age.
Now, we apply the Pandas melt()
function:
pd.melt(df, id_vars=['Student'], value_vars=['Major'])
Student | variable | value | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
Inside the function, we put in the data frame and we apply the parameters “id_vars
” and “value_vars
“.
- The “
id_vars
” parameter expects a list of the columns to use as the identifier variables. These are the columns that remain unchanged. For this example, we choose the “Student
” column. - The “
value_vars
” parameter expects a list of the columns to be unpivoted. Or in other words: the columns that get transformed into rows. We select the “Major
” column to be converted into rows.
The output shows a modified data frame. The “Student
” column remains the same as before. But now, we neither have the “Major
” column, nor the “Age
” column and instead, we have the columns “variable
” and “value
“.
The “variable
” column’s elements all say “Major
” because we unpivoted the “Major
” column and turned it into row values. The “value
” column contains the respective “Major
” value for each student.
Compared to the initial data frame, we see that the “Age
” column is now completely missing. That’s because we did not incorporate it into the melt()
function.
Unpivot Multiple Columns
In the previous example, we did not include the “Age
” column because we did not assign it to the “id_vars
” parameter or the “value_vars
” parameter. All columns that are not assigned to one of these parameters are dropped.
But since the “value_vars
” parameter expects a list of columns and not necessarily one column only, we can apply multiple columns to the parameter:
pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'])
Student | variable | value | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
3 | Alice | Age | 21 |
4 | Mary | Age | 20 |
5 | Bob | Age | 25 |
Here, we run the melt()
function the same way as before. But this time, we assign two columns to the “value_vars
” parameter, namely the “Major
” column and the “Age
” column. This way, we unpivot two columns instead of one.
As we can see, the data frame now has twice as many rows as before because the “variable
” column now contains two unpivoted columns, the “Major
” column, and the “Age
” column. Thus, the “value
” column now contains values for both unpivoted columns.
The column that we use as the identifier variable (the “Student
” column) holds every student’s name two times. That’s because for each student we now have a major and age in two separate rows since the “Major
” column and the “Age
” column were unpivoted.
For example, Alice majors in Biology and is 21 years old. These are two separate rows in the new data frame, whereas this was only one row in the initial one.
Ignore the Index
In the last section, we created a data frame that was twice as long as the initial one. The longer data frame has its own indexes as we can see here:
Student | variable | value | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
3 | Alice | Age | 21 |
4 | Mary | Age | 20 |
5 | Bob | Age | 25 |
But maybe, we want to keep the original indexes to keep track of the size of the original data frame.
We achieve that by applying the “ignore_index
” parameter and assigning it “False
“. In the section above this parameter was set to “True” by default thus creating new indexes and representing the length of the new data frame.
If we set it to “False
“, this is what we get:
pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'], ignore_index=False)
Student | variable | value | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
0 | Alice | Age | 21 |
1 | Mary | Age | 20 |
2 | Bob | Age | 25 |
The rest of the melt()
function remains the same. We only change the “ignore_index
” parameter. Now, we keep the original indexes.
For example, both “Alice” rows now have the index “0” because, in the original data frame, the “Alice” row’s index was “0” as well, whereas in the previous data frame the first “Alice” row’s index was set to “0” and the second “Alice” row’s index was set to “3”.
Multiple Identifier Columns
We already saw how we can unpivot multiple columns. We achieved that by applying multiple column names to the “value_vars
” parameter which were then unpivoted.
The “id_vars
” parameter also expects a list of columns, hence we can apply multiple columns here, too.
pd.melt(df, id_vars=['Student', 'Major'], value_vars=['Age'])
Student | Major | variable | value | |
0 | Alice | Biology | Age | 21 |
1 | Mary | CS | Age | 20 |
2 | Bob | CS | Age | 25 |
We assign two columns to the “id_vars
” parameter: The “Student
” column and the “Major
” column. So, these columns remain unchanged.
The “value_vars
” parameter gets assigned the column “Age
“. That means, this column gets unpivoted and thus converted into row values. So, the “variable
” column’s elements all say “Age
” and the “value
” column contains the respective “Age
” values of each student.
Applying multiple columns to both the “id_vars
” parameter as well as the “value_vars
” parameter makes the melt()
function very flexible since we can unpivot our data frame in lots of different ways.
Label the “variable” and “value” columns
By now, when using the melt()
function, the resulting data frame always contained two new columns, “variable
” and “value
“. These names can be confusing since in some data frames it might not be obvious what the columns contain.
Luckily, the melt()
function provides us with the parameters “var_name
” and “value_name
“. Using these parameters, we can label the resulting “variable
” and “value
” columns by assigning them the desired name as a string value.
pd.melt(df, id_vars=['Student', 'Major'], value_vars=['Age'], var_name='Age column', value_name='Age values')
Student | Major | Age column | Age values | |
0 | Alice | Biology | Age | 21 |
1 | Mary | CS | Age | 20 |
2 | Bob | CS | Age | 25 |
We apply the same melt()
function as in the previous section. But this time, we use the naming parameters and assign the “var_name
” parameter the string "Age column"
and the “value_name
” parameter the value "Age values"
. The resulting data frame contains these new column labels.
In this example, it was relatively easy to name the columns since we only had one unpivoted column (the “Age
” column). So, we simply named the new columns after the “Age
” column.
However, when we have multiple unpivoted columns, the naming process is not so straightforward. We have to find comprehensible labels.
Let’s have a look at the melt()
function example from above with multiple columns to unpivot:
pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'])
Student | variable | value | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
3 | Alice | Age | 21 |
4 | Mary | Age | 20 |
5 | Bob | Age | 25 |
The “variable
” column contains the values “Major
” and “Age
“. And the “value
” parameter contains the respective values for these variables. Consequently, we need to find names that fit here properly:
pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'], var_name='Major/Age column', value_name='Major/Age values')
Student | Major/Age column | Major/Age values | |
0 | Alice | Major | Biology |
1 | Mary | Major | CS |
2 | Bob | Major | CS |
3 | Alice | Age | 21 |
4 | Mary | Age | 20 |
5 | Bob | Age | 25 |
We name the “variable
” column "Major/Age column"
and the “value
” column "Major/Age values"
.
It is critical to label these columns properly, so we know what they contain.
Summary
All in all, we learned how to use the melt()
function and how to apply its various parameters. We saw how we can unpivot one or more columns, how to handle indexes, how to set multiple identifier columns, and how to label the newly created columns.
The melt()
function allows us to unpivot our data frames in several ways.
For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter blog page.
Happy Coding!