Pandas melt(), explode(), squeeze(), to_xarray(), transpose()

5/5 - (2 votes)

The Pandas DataFrame/Series has several methods to re-shape, sort, and transpose the data.


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Xarray library works with labeled multi-dimensional arrays and advanced analytics.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install xarray

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import xarray

DataFrame melt()

The melt() method unpivots a DataFrame/Series from the standard wide (horizontal) format to long (vertical).

Pandas Melt | pd.melt() | df.melt()

The syntax for this method is as follows:

DataFrame.melt(id_vars=None, value_vars=None, var_name=None, 
               value_name='value', col_level=None, ignore_index=True)
ParameterDescription
id_varsThis parameter is the column(s) to use for the identifier vars. Allowed dtypes are: tuple, list, or ndarray.
value_varsThe column(s) name(s) to unpivot. If empty, use all columns. Allowed dtypes are: tuple, list, or ndarray.
var_nameThe name(s) for the variable column(s).
value_nameThe name(s) for the value column(s).
col_levelIf MultiIndex, use this parameter to melt.
ignore_indexIf True, ignore the original index. If False, use the original index.

For this example, four (4) students enroll in coding classes. The output will display in various formats using the melt() method.

Code – Example 1

df = pd.DataFrame({'Enrolled': {0: '01/01/2021', 1: '01/02/2021',  2: '01/29/2021', 3: '01/13/2021'},
                   'Student':  {0: 'Micah',      1: 'Philip',   2: 'Jonas',         3: 'Sarah'},
                   'Class':    {0: 'Python',     1: 'PHP',     2: 'JavaScript', 3: 'Java'},
                   'Grade':    {0: 65, 1: 76, 2: 55, 3: 81}})
print(df)

result = pd.melt(df, id_vars =['Student'], value_vars =['Class', 'Grade'])
print(result)
  • Line [1] creates a DataFrame from a dictionary containing student details and saves it to df.
  • Line [2] outputs the DataFrame to the terminal.
  • Line [3] sets the id_vars to Student and the value_vars to Class and Grade. The output saves to result.
  • Line [4] outputs the result to the terminal.

Output

df

 EnrolledStudentClassGrade
001/01/2021Micah     Python    65
101/02/2021 Philip        PHP    76
201/29/2021  Jonas JavaScript    55
301/13/2021  Sarah       Java    81

result

 Studentvariablevalue
0Micah     Class    Python
1Philip        Class    PHP
2Jonas Class    JavaScript
3Sarah       Class    Java
4Micah     Grade    65
5Philip        Grade    76
6Jonas Grade    55
7Sarah       Grade    81

πŸ’‘ Note: From the code on line [3] and the result, the melt() method unpivots the DataFrame around the Student column (id_vars =['Student']).

For this example, the column labels (shown above) will update. Previously the default column labels were named variable and value (see example 1).

Below the column labels change.

Code – Example 2

df = pd.DataFrame({'Enrolled': {0: '01/01/2021', 1: '01/02/2021',  2: '01/29/2021', 3: '01/13/2021'},
                   'Student':  {0: 'Micah',      1: 'Philip',   2: 'Jonas',         3: 'Sarah'},
                   'Class':    {0: 'Python',     1: 'PHP',     2: 'JavaScript', 3: 'Java'},
                   'Grade':    {0: 65, 1: 76, 2: 55, 3: 81}})

result = pd.melt(df, id_vars =['Student'], value_vars =['Class', 'Grade'],
                 var_name ='All-Students', value_name ='All-Grades')
print(result)
  • Line [1] creates a DataFrame from a dictionary containing student details and saves it to df.
  • Line [2] sets the id_vars to Student and the value_vars to Class and Grade. This line also changes the default names variable and value to All-Students and All-Grades. The output saves to result.
  • Line [3] outputs the result to the terminal.

result

 StudentAll StudentsAll Grades
0Micah     Class    Python
1Philip        Class    PHP
2Jonas Class    JavaScript
3Sarah       Class    Java
 Micah     Grade    65
5Philip        Grade    76
6Jonas Grade    55
7Sarah       Grade    81

DataFrame explode()

The explode() method reshapes (transforms) the DataFrame/Series elements from a list-like scenario to a row format.

How to unnest (explode) a column in a pandas DataFrame?

The syntax for this method is as follows:

DataFrame.explode(column, ignore_index=False)
columnThis parameter is the column(s) to explode (string, tuple, or list format).
ignore_indexIf True, the index labels will be 0, 1, 2, etc. If False, the set index displays.

For this example, a public school conducts three (3) classes per day for three (3) grades (6-8).

Each grade contains a list with the number of students per class. For example, the Grade-7 class explodes to display each element inside the list in a long (vertical) format.

df = pd.DataFrame({'Grade-6': [[15, 11, 10]],
                   'Grade-7': [[12, 16, 11]],
                   'Grade-8': [[20, 18, 19]]})
print(df)

result = df.explode('Grade-7')
print(result)
  • Line [1] creates a DataFrame from a Dictionary containing class details and saves it to df.
  • Line [2] outputs the DataFrame to the terminal.
  • Line [3] explodes the format for Grade-7. This output saves to result.
  • Line [4] outputs the result to the terminal.

Output

df

 Grade-6Grade-7Grade-8
0[15, 11, 10][12, 16, 11] [20, 18, 19]

result

 Grade-6Grade-7Grade-8
0[15, 11, 10]12 [20, 18, 19]
0[15, 11, 10]16[20, 18, 19]
0[15, 11, 10]11[20, 18, 19]

DataFrame squeeze()

The squeeze() method compresses a one-dimensional DataFrame/Series axis into a Series.

πŸ’‘ Note: Squeezing objects containing more than one element per axis does not change the original DataFrame/Series. This method is most effective when used with a DataFrame.

The syntax for this method is as follows:

DataFrame.squeeze(axis=None)
ParameterDescription
axisIf zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.

For this example, we have two (2) classical composers. Each composer contains a list with their total number of Preludes and Nocturnes. The DataFrame squeezes to display the details for Chopin.

Code – Example 1

df = pd.DataFrame([[24, 18], 
                   [4, 21]], 
                   columns=['Debussy', 'Chopin'])
print(df)

col = df[['Chopin']]
result = col.squeeze('columns')
print(result)
  • Line [1] creates a DataFrame, assigns the column names, and saves it to df.
  • Line [2] outputs the DataFrame to the terminal.
  • Line [3] slices out the column containing Chopin’s composition details and saves it to col.
  • Line [4] squeezes the column. The output saves to result.
  • Line [5] outputs the result to the terminal.

Output

df

 DebussyChopin
02418
1421

result

018
121
Name: Chopin, dtype: int64

DataFrame to_xarray()

The to_xarray() method returns an xarray. Dataset/xarray.DataArray from a DataFrame/Series dependent on the object passed.

πŸ’‘ Note: If the object is a DataFrame, the data converts to a Dataset.

If a Series, the data converts to a DataArray.

The syntax for this method is as follows:

DataFrame.to_xarray()

This method has no parameters.

For this example, Alice needs to grocery shop for a party. This DataFrame displays the grocery list (including categories, prices, and amounts). This object then converts to a Dataset.

Code – Example 1

df = pd.DataFrame([('Fruits',          'Apples',    3.97,   4),
                   ('Dairy',           'Milk',       2.43,   4),
                   ('Vegetables', 'Carrots',  4.21,  12),
                   ('Meat',           'Steak',    18.57, 4)],
                   columns=['Category', 'Name', 'Price', 'Quantity'])
print(df)

result = df.to_xarray()
print(result)
  • Line [1] creates a DataFrame from a list of tuples containing grocery list details and saves it to df.
  • Line [2] outputs the DataFrame to the terminal.
  • Line [3] converts the DataFrame to a Dataset. The output saves to result.
  • Line [4] outputs the result to the terminal.

Output

df

 CategoryNamePriceQuantity
0FruitsApples3.974
1DairyMilk2.434
2VegetablesCarrots4.2112
3MeatSteak18.574

result

Dimensions:   	(index: 4)
Coordinates:
  * index     	(index) int64 0 1 2 3
Data variables:
    Category	(index) object 'Fruits' 'Dairy' 'Vegetables' 'Meat'
    Name      	(index) object 'Apples' 'Milk' 'Carrots' 'Steak'
    Price     		(index) float64 3.97 2.43 4.21 18.57
    Quantity		(index) int64 4 4 12 4

You could also extract one (1) column from a DataFrame.

Code  – Example 2

df = pd.DataFrame([('Fruits', 'Apples', 3.97, 4),
                   ('Dairy', 'Milk', 2.43, 4),
                   ('Vegetables', 'Carrots', 4.21, 12),
                   ('Meat', 'Steak', 18.57, 4)],
                   columns=['Category', 'Name', 'Price', 'Quantity'])

result = df['Name'].to_xarray()
print(result)
  • Line [1] creates a DataFrame from a List of Tuples containing grocery list details and saves it to df.
  • Line [2] converts a column of the DataFrame to a Dataset. The output saves to result.
  • Line [3] outputs the result to the terminal.

Output

<xarray.DataArray 'Name' (index: 4)>
array(['Apples', 'Milk', 'Carrots', 'Steak'], dtype=object)
Coordinates:
  * index    (index) int64 0 1 2 3

This example converts a Series containing the first six (6) months of the year and the days per month to an xarray.

Code – Example 3

my_series = pd.Series(['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May', 'June'])
my_idx    = [31, 28, 31, 30, 31, 30]
my_series.index = my_idx
print(my_series)

result = my_series.to_xarray()
print(result)
  • Line [1] creates a Series from a list of months.
  • Line [2] creates and sets the indexes as a list of days for each month.
  • Line [3] sets the index for the Series.
  • Line [4] outputs the Series to the terminal.
  • Line [5] converts the Series to an xarray.
  • Line [6] outputs the result to the terminal.

Output

my_series

31Jan.
28Feb.
31Mar.
30Apr.
31May
30June

result

dtype: object
<xarray.DataArray (index: 6)>
array(['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May', 'June'], dtype=object)
Coordinates:
  * index    (index) int64 31 28 31 30 31 30

DataFrame T & transpose()

The T or transpose() method switches (transposes) the index and columns.

The syntax for this method is as follows:

DataFrame.transpose(*args, copy=False)
*argsThis parameter is for compatibility with NumPy.
copyIf True, the transformation occurs on a copy of the DataFrame/Series. If False, the transformation updates the original. This parameter is False, by default.

For this example, the countries.csv file reads in.

πŸ’‘ Note: Click here to download the CSV file. Move to the current working directory.

df = pd.read_csv('countries.csv').head(3)
print(df)

result1 = df.T
print(result1)

result2 = df.transpose()
print(result2)
  • Line [1] reads in the top three (3) rows of the comma-delimited CSV file. The output saves to df.
  • Line [2] outputs the DataFrame to the terminal.
  • Line [3] uses the T method to transpose the DataFrame. The output saves to result1.
  • Line [4] outputs result1 to the terminal.
  • Line [5] uses the transpose() method to transpose the DataFrame. The output saves to result2.
  • Line [6] outputs result2 to the terminal.

Output

df

 CountryCapitalPopulationArea
0GermanyBerlin83783942 357021
1FranceParis67081000 551695
2SpainMadrid47431256 498511

result1

 012
CountryGermanyFranceSpain
CapitalBerlinParisMadrid
Population83783942 67081000 47431256 
Area357021551695498511

result2

 012
CountryGermanyFranceSpain
CapitalBerlinParisMadrid
Population83783942 67081000 47431256 
Area357021551695498511

πŸ’‘ Note: The output from result1 and result2 are identical.


Further Learning Resources

This is Part 15 of the DataFrame method series.

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
  • Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
  • Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
  • Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
  • Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
  • Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
  • Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
  • Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
  • Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
  • Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
  • Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
  • Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
  • Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
  • Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
  • Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
  • Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().

Also, have a look at the Pandas DataFrame methods cheat sheet!