Pandas melt(), explode(), squeeze(), to_xarray(), transpose()

The Pandas DataFrame/Series has several methods to re-shape, sort, and transpose the data.

Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The Xarray library works with labeled multi-dimensional arrays and advanced analytics.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install xarray

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to Install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import xarray

DataFrame melt()

The melt() method unpivots a DataFrame/Series from the standard wide (horizontal) format to long (vertical).

The syntax for this method is as follows:

DataFrame.melt(id_vars=None, value_vars=None, var_name=None, 
               value_name='value', col_level=None, ignore_index=True)

Parameter	Description
`id_vars`	This parameter is the column(s) to use for the identifier vars. Allowed `dtypes` are: tuple, list, or ndarray.
`value_vars`	The column(s) name(s) to unpivot. If empty, use all columns. Allowed `dtypes` are: tuple, list, or ndarray.
`var_name`	The name(s) for the variable column(s).
`value_name`	The name(s) for the value column(s).
`col_level`	If `MultiIndex`, use this parameter to melt.
`ignore_index`	If `True`, ignore the original index. If `False`, use the original index.

For this example, four (4) students enroll in coding classes. The output will display in various formats using the melt() method.

Code – Example 1

df = pd.DataFrame({'Enrolled': {0: '01/01/2021', 1: '01/02/2021',  2: '01/29/2021', 3: '01/13/2021'},
                   'Student':  {0: 'Micah',      1: 'Philip',   2: 'Jonas',         3: 'Sarah'},
                   'Class':    {0: 'Python',     1: 'PHP',     2: 'JavaScript', 3: 'Java'},
                   'Grade':    {0: 65, 1: 76, 2: 55, 3: 81}})
print(df)

result = pd.melt(df, id_vars =['Student'], value_vars =['Class', 'Grade'])
print(result)

Line [1] creates a DataFrame from a dictionary containing student details and saves it to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] sets the id_vars to Student and the value_vars to Class and Grade. The output saves to result.
Line [4] outputs the result to the terminal.

Output

df

	Enrolled	Student	Class	Grade
0	01/01/2021	Micah	Python	65
1	01/02/2021	Philip	PHP	76
2	01/29/2021	Jonas	JavaScript	55
3	01/13/2021	Sarah	Java	81

result

	Student	variable	value
0	Micah	Class	Python
1	Philip	Class	PHP
2	Jonas	Class	JavaScript
3	Sarah	Class	Java
4	Micah	Grade	65
5	Philip	Grade	76
6	Jonas	Grade	55
7	Sarah	Grade	81

💡 Note: From the code on line [3] and the result, the melt() method unpivots the DataFrame around the Student column (id_vars =['Student']).

For this example, the column labels (shown above) will update. Previously the default column labels were named variable and value (see example 1).

Below the column labels change.

Code – Example 2

df = pd.DataFrame({'Enrolled': {0: '01/01/2021', 1: '01/02/2021',  2: '01/29/2021', 3: '01/13/2021'},
                   'Student':  {0: 'Micah',      1: 'Philip',   2: 'Jonas',         3: 'Sarah'},
                   'Class':    {0: 'Python',     1: 'PHP',     2: 'JavaScript', 3: 'Java'},
                   'Grade':    {0: 65, 1: 76, 2: 55, 3: 81}})

result = pd.melt(df, id_vars =['Student'], value_vars =['Class', 'Grade'],
                 var_name ='All-Students', value_name ='All-Grades')
print(result)

Line [1] creates a DataFrame from a dictionary containing student details and saves it to df.
Line [2] sets the id_vars to Student and the value_vars to Class and Grade. This line also changes the default names variable and value to All-Students and All-Grades. The output saves to result.
Line [3] outputs the result to the terminal.

result

	Student	All Students	All Grades
0	Micah	Class	Python
1	Philip	Class	PHP
2	Jonas	Class	JavaScript
3	Sarah	Class	Java
	Micah	Grade	65
5	Philip	Grade	76
6	Jonas	Grade	55
7	Sarah	Grade	81

DataFrame explode()

The explode() method reshapes (transforms) the DataFrame/Series elements from a list-like scenario to a row format.

The syntax for this method is as follows:

DataFrame.explode(column, ignore_index=False)

`column`	This parameter is the column(s) to explode (string, tuple, or list format).
`ignore_index`	If `True`, the index labels will be 0, 1, 2, etc. If `False`, the set index displays.

For this example, a public school conducts three (3) classes per day for three (3) grades (6-8).

Each grade contains a list with the number of students per class. For example, the Grade-7 class explodes to display each element inside the list in a long (vertical) format.

df = pd.DataFrame({'Grade-6': [[15, 11, 10]],
                   'Grade-7': [[12, 16, 11]],
                   'Grade-8': [[20, 18, 19]]})
print(df)

result = df.explode('Grade-7')
print(result)

Line [1] creates a DataFrame from a Dictionary containing class details and saves it to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] explodes the format for Grade-7. This output saves to result.
Line [4] outputs the result to the terminal.

Output

df

	Grade-6	Grade-7	Grade-8
0	[15, 11, 10]	[12, 16, 11]	[20, 18, 19]

result

	Grade-6	Grade-7	Grade-8
0	[15, 11, 10]	12	[20, 18, 19]
0	[15, 11, 10]	16	[20, 18, 19]
0	[15, 11, 10]	11	[20, 18, 19]

DataFrame squeeze()

The squeeze() method compresses a one-dimensional DataFrame/Series axis into a Series.

💡 Note: Squeezing objects containing more than one element per axis does not change the original DataFrame/Series. This method is most effective when used with a DataFrame.

The syntax for this method is as follows:

DataFrame.squeeze(axis=None)

Parameter	Description
`axis`	If zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.

For this example, we have two (2) classical composers. Each composer contains a list with their total number of Preludes and Nocturnes. The DataFrame squeezes to display the details for Chopin.

Code – Example 1

df = pd.DataFrame([[24, 18], 
                   [4, 21]], 
                   columns=['Debussy', 'Chopin'])
print(df)

col = df[['Chopin']]
result = col.squeeze('columns')
print(result)

Line [1] creates a DataFrame, assigns the column names, and saves it to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] slices out the column containing Chopin’s composition details and saves it to col.
Line [4] squeezes the column. The output saves to result.
Line [5] outputs the result to the terminal.

Output

df

	Debussy	Chopin
0	24	18
1	4	21

result

0	18
1	21
Name: Chopin, dtype: int64

DataFrame to_xarray()

The to_xarray() method returns an xarray. Dataset/xarray.DataArray from a DataFrame/Series dependent on the object passed.

💡 Note: If the object is a DataFrame, the data converts to a Dataset.

If a Series, the data converts to a DataArray.

The syntax for this method is as follows:

DataFrame.to_xarray()

This method has no parameters.

For this example, Alice needs to grocery shop for a party. This DataFrame displays the grocery list (including categories, prices, and amounts). This object then converts to a Dataset.

Code – Example 1

df = pd.DataFrame([('Fruits',          'Apples',    3.97,   4),
                   ('Dairy',           'Milk',       2.43,   4),
                   ('Vegetables', 'Carrots',  4.21,  12),
                   ('Meat',           'Steak',    18.57, 4)],
                   columns=['Category', 'Name', 'Price', 'Quantity'])
print(df)

result = df.to_xarray()
print(result)

Line [1] creates a DataFrame from a list of tuples containing grocery list details and saves it to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] converts the DataFrame to a Dataset. The output saves to result.
Line [4] outputs the result to the terminal.

Output

df

	Category	Name	Price	Quantity
0	Fruits	Apples	3.97	4
1	Dairy	Milk	2.43	4
2	Vegetables	Carrots	4.21	12
3	Meat	Steak	18.57	4

result

Dimensions:   	(index: 4)
Coordinates:
  * index     	(index) int64 0 1 2 3
Data variables:
    Category	(index) object 'Fruits' 'Dairy' 'Vegetables' 'Meat'
    Name      	(index) object 'Apples' 'Milk' 'Carrots' 'Steak'
    Price     		(index) float64 3.97 2.43 4.21 18.57
    Quantity		(index) int64 4 4 12 4

You could also extract one (1) column from a DataFrame.

Code – Example 2

df = pd.DataFrame([('Fruits', 'Apples', 3.97, 4),
                   ('Dairy', 'Milk', 2.43, 4),
                   ('Vegetables', 'Carrots', 4.21, 12),
                   ('Meat', 'Steak', 18.57, 4)],
                   columns=['Category', 'Name', 'Price', 'Quantity'])

result = df['Name'].to_xarray()
print(result)

Line [1] creates a DataFrame from a List of Tuples containing grocery list details and saves it to df.
Line [2] converts a column of the DataFrame to a Dataset. The output saves to result.
Line [3] outputs the result to the terminal.

Output

<xarray.DataArray 'Name' (index: 4)>
array(['Apples', 'Milk', 'Carrots', 'Steak'], dtype=object)
Coordinates:
  * index    (index) int64 0 1 2 3

This example converts a Series containing the first six (6) months of the year and the days per month to an xarray.

Code – Example 3

my_series = pd.Series(['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May', 'June'])
my_idx    = [31, 28, 31, 30, 31, 30]
my_series.index = my_idx
print(my_series)

result = my_series.to_xarray()
print(result)

Line [1] creates a Series from a list of months.
Line [2] creates and sets the indexes as a list of days for each month.
Line [3] sets the index for the Series.
Line [4] outputs the Series to the terminal.
Line [5] converts the Series to an xarray.
Line [6] outputs the result to the terminal.

Output

my_series

31	Jan.
28	Feb.
31	Mar.
30	Apr.
31	May
30	June

result

dtype: object
<xarray.DataArray (index: 6)>
array(['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May', 'June'], dtype=object)
Coordinates:
  * index    (index) int64 31 28 31 30 31 30

DataFrame T & transpose()

The T or transpose() method switches (transposes) the index and columns.

The syntax for this method is as follows:

DataFrame.transpose(*args, copy=False)

`*args`	This parameter is for compatibility with NumPy.
`copy`	If `True`, the transformation occurs on a copy of the DataFrame/Series. If `False`, the transformation updates the original. This parameter is `False`, by default.

For this example, the countries.csv file reads in.

💡 Note: Click here to download the CSV file. Move to the current working directory.

df = pd.read_csv('countries.csv').head(3)
print(df)

result1 = df.T
print(result1)

result2 = df.transpose()
print(result2)

Line [1] reads in the top three (3) rows of the comma-delimited CSV file. The output saves to df.
Line [2] outputs the DataFrame to the terminal.
Line [3] uses the T method to transpose the DataFrame. The output saves to result1.
Line [4] outputs result1 to the terminal.
Line [5] uses the transpose() method to transpose the DataFrame. The output saves to result2.
Line [6] outputs result2 to the terminal.

Output

df

	Country	Capital	Population	Area
0	Germany	Berlin	83783942	357021
1	France	Paris	67081000	551695
2	Spain	Madrid	47431256	498511

result1

	0	1	2
Country	Germany	France	Spain
Capital	Berlin	Paris	Madrid
Population	83783942	67081000	47431256
Area	357021	551695	498511

result2

	0	1	2
Country	Germany	France	Spain
Capital	Berlin	Paris	Madrid
Population	83783942	67081000	47431256
Area	357021	551695	498511

💡 Note: The output from result1 and result2 are identical.

Further Learning Resources

This is Part 15 of the DataFrame method series.

Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().

Also, have a look at the Pandas DataFrame methods cheat sheet!