Pandas DataFrame Function Application – Part 1

Rate this post

The Pandas DataFrame has several Function Applications, GroupBy & Window methods. When applied to a DataFrame, these methods modify the output of a DataFrame.

Part 1 of this series focuses on Function Applications and delves into each of the following methods.


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import numpy as np 

DataFrame apply()

The apply() method accepts a function across an axis of a DataFrame. This method returns a Series or DataFrame along the specified axis as determined by the axis parameter set below.

The syntax for this method is as follows:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
ParameterDescription
funcThis parameter is a function applied to either row(s) or column(s). This parameter depends on the axis selected.
axisIf zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.
rawThis determines if a row or column uses a Series or ndarray.
False passes each row/column as a Series to the function. By default, False.
True uses the ndarray. To receive better results, use this option.
result_typeThis parameter applies when the axis parameter equals 1 (Column).
Expand is list-like objects converted to columns.
Reduce returns a Series rather than a list-like result: the opposite of Expand.
Broadcast: results sent to the original DataFrame shape. Index and columns remain the same. By default, None.
argsThe positional argument to pass to the function. By default, a Tuple
**kwargsAdditional keyword arguments to pass as keywords arguments.

Rivers Clothing has completed a market analysis of its product pricing. They have determined that pricing on Tops and Tanks falls well below the profit margins of their other lines. Use the apply() method with the lambda function to update these prices.

Code – Example 1

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

result = df.apply(lambda x: x*2 if x.name in ['Tops', 'Tanks'] else x)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
  • Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
  • Line [3-4] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [5] uses the apply() method with a lambda. This line multiplies each element in Tops and Tanks by two (2). The output saves to the result variable. Other pricing remains unchanged.
  • Line [6] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Small$20.24$22.70$21.37 $27.15
Medium$24.46$26.90$56.99 $21.85
Large$27.90$29.96$94.87$35.75

This example uses the apply() method and np.sum. This code calculates the sum of all amounts held in Inventory based on product type.

Code – Example 2

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format

result = df.apply(np.sum, axis=0)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
  • Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
  • Line [3] uses apply() with np.sum and sums the product prices along the Column axis. The output saves to the result variable.
  • Line [4] outputs the result to the terminal.

Output

Tops$36.30
Tanks$39.78
Pants$173.23
Sweats$84.75
dtype: float64

DataFrame applymap()

The applymap() method applies a function element-wise to a DataFrame. This method returns a transformed DataFrame.

The syntax for this method is as follows:

DataFrame.applymap(func, na_action=None, **kwargs)
ParameterDescription
funcThis parameter is callable and returns a single value from a single value.
na_actionThe options are: None/Ignore. Ignore: Propagates NaN values and does not pass to func. By default, None.
**kwargsAdditional keyword arguments to pass as keywords arguments to the function.

For this example, any item priced at 13.45 has an 'M' appended to the end. This initial indicates that the item price needs to be adjusted. The M stands for Modify.

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

result = df.applymap(lambda x: str(x) + 'M' if x == 13 else x)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
  • Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
  • Line [3-4] creates and sets the index for the DataFrame.
  • Line [5] uses applymap() with a lambda to search for the price of 13.45. If found, an 'M' appends to the end. The output saves to the result variable.
  • Line [6] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Small$10.12$11.35$21.37 $27.15
Medium$12.23  13.45M$56.99  $21.85
Large$13.95$14.98$94.87$35.75

DataFrame pipe()

The pipe() method takes a function and will apply it to each element of the DataFrame or a subset thereof.

The syntax for this method is as follows:

DataFrame.pipe(func, *args, **kwargs)
ParameterDescription
funcApplies to a Series/DataFrame. Arguments args and **kwargs are passed to the function.
argsThis parameter is an iterable and is optional and passed to func.
**kwargsThis parameter is a Dictionary of keyword arguments passed into func.

Rivers Clothing realized the pricing for Pants is a little too high and needs adjusting. The pipe method with a custom function is perfect for performing this price adjustment!

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

def change_price(x):
    x['Pants'] = [21.50, 36.95, 55.72]
    return x

result = df.pipe(change_price)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
  • Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
  • Line [3-4] creates and sets the index for the DataFrame.
  • Line [5-7] creates the change_price function. This function changes the price for each item in the Pants category.
  • Line [8] calls the change_price function and saves the output to the results variable.
  • Line [9] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Small$10.12$11.35$21.50$27.15
Medium$12.23  $13.45$36.95  $21.85
Large$13.95$14.98$55.71$35.75

DataFrame agg() & aggregate()

The DataFrame agg() and aggregate() methods are identical. Both functions apply an aggregation across single or multiple columns.

This method can return one of the following:

  • Scalar: when the Series.agg method is called with a single function.
  • Series: when the code calls the DataFrame.agg method and uses a single function.
  • DataFrame: when the DataFrame.agg method is called with several functions.

The syntax for this method is as follows:

DataFrame.agg(func=None, axis=0, *args, **kwargs)
ParameterDescription
funcThis parameter is a function used to aggregate data.
axisIf zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.
argsThis parameter is optional.
**kwargsThis parameter is keyword arguments passed to func.

For this example, Rivers Clothing needs to determine its highest and lowest priced items.  

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format

result = df.agg([min, max])                  
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
  • Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
  • Line [3] retrieves the min and max prices. This output saves to the results variable.
  • Line [4] outputs the result to the terminal.

Output

Upon reviewing the DataFrame and the output below, size Small has the lowest price, and size Large has the highest price.

 TopsTanksPantsSweats
min$10.12$11.35$21.37$21.85
max$13.95$14.98$94.87$35.75