Pandas DataFrame Function Application - Part 1 - Be on the Right Side of Change

The Pandas DataFrame has several Function Applications, GroupBy & Window methods. When applied to a DataFrame, these methods modify the output of a DataFrame.

Part 1 of this series focuses on Function Applications and delves into each of the following methods.

Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import numpy as np

DataFrame apply()

The apply() method accepts a function across an axis of a DataFrame. This method returns a Series or DataFrame along the specified axis as determined by the axis parameter set below.

The syntax for this method is as follows:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)

Parameter	Description
`func`	This parameter is a function applied to either row(s) or column(s). This parameter depends on the axis selected.
`axis`	If zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.
`raw`	This determines if a row or column uses a Series or `ndarray`. `False` passes each row/column as a Series to the function. By default, `False`. True uses the `ndarray`. To receive better results, use this option.
`result_type`	This parameter applies when the `axis` parameter equals 1 (Column). Expand is list-like objects converted to columns. Reduce returns a Series rather than a list-like result: the opposite of Expand. Broadcast: results sent to the original DataFrame shape. Index and columns remain the same. By default, `None`.
`args`	The positional argument to pass to the function. By default, a Tuple
`**kwargs`	Additional keyword arguments to pass as keywords arguments.

Rivers Clothing has completed a market analysis of its product pricing. They have determined that pricing on Tops and Tanks falls well below the profit margins of their other lines. Use the apply() method with the lambda function to update these prices.

Code – Example 1

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

result = df.apply(lambda x: x*2 if x.name in ['Tops', 'Tanks'] else x)
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
Line [3-4] creates and sets the index for the DataFrame (Small/Medium/Large).
Line [5] uses the apply() method with a lambda. This line multiplies each element in Tops and Tanks by two (2). The output saves to the result variable. Other pricing remains unchanged.
Line [6] outputs the result to the terminal.

Output

	Tops	Tanks	Pants	Sweats
Small	$20.24	$22.70	$21.37	$27.15
Medium	$24.46	$26.90	$56.99	$21.85
Large	$27.90	$29.96	$94.87	$35.75

This example uses the apply() method and np.sum. This code calculates the sum of all amounts held in Inventory based on product type.

Code – Example 2

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format

result = df.apply(np.sum, axis=0)
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
Line [3] uses apply() with np.sum and sums the product prices along the Column axis. The output saves to the result variable.
Line [4] outputs the result to the terminal.

Output

Tops	$36.30
Tanks	$39.78
Pants	$173.23
Sweats	$84.75
dtype: float64

DataFrame applymap()

The applymap() method applies a function element-wise to a DataFrame. This method returns a transformed DataFrame.

The syntax for this method is as follows:

DataFrame.applymap(func, na_action=None, **kwargs)

Parameter	Description
`func`	This parameter is callable and returns a single value from a single value.
`na_action`	The options are: `None`/`Ignore`. `Ignore`: Propagates `NaN` values and does not pass to `func`. By default, `None`.
`**kwargs`	Additional keyword arguments to pass as keywords arguments to the function.

For this example, any item priced at 13.45 has an 'M' appended to the end. This initial indicates that the item price needs to be adjusted. The M stands for Modify.

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

result = df.applymap(lambda x: str(x) + 'M' if x == 13 else x)
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
Line [3-4] creates and sets the index for the DataFrame.
Line [5] uses applymap() with a lambda to search for the price of 13.45. If found, an 'M' appends to the end. The output saves to the result variable.
Line [6] outputs the result to the terminal.

Output

	Tops	Tanks	Pants	Sweats
Small	$10.12	$11.35	$21.37	$27.15
Medium	$12.23	13.45M	$56.99	$21.85
Large	$13.95	$14.98	$94.87	$35.75

DataFrame pipe()

The pipe() method takes a function and will apply it to each element of the DataFrame or a subset thereof.

The syntax for this method is as follows:

DataFrame.pipe(func, *args, **kwargs)

Parameter	Description
`func`	Applies to a Series/DataFrame. Arguments `args` and `**kwargs` are passed to the function.
`args`	This parameter is an iterable and is optional and passed to `func`.
`**kwargs`	This parameter is a Dictionary of keyword arguments passed into `func`.

Rivers Clothing realized the pricing for Pants is a little too high and needs adjusting. The pipe method with a custom function is perfect for performing this price adjustment!

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format
index_ = ['Small', 'Medium', 'Large']
df.index = index_

def change_price(x):
    x['Pants'] = [21.50, 36.95, 55.72]
    return x

result = df.pipe(change_price)
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
Line [3-4] creates and sets the index for the DataFrame.
Line [5-7] creates the change_price function. This function changes the price for each item in the Pants category.
Line [8] calls the change_price function and saves the output to the results variable.
Line [9] outputs the result to the terminal.

Output

	Tops	Tanks	Pants	Sweats
Small	$10.12	$11.35	$21.50	$27.15
Medium	$12.23	$13.45	$36.95	$21.85
Large	$13.95	$14.98	$55.71	$35.75

DataFrame agg() & aggregate()

The DataFrame agg() and aggregate() methods are identical. Both functions apply an aggregation across single or multiple columns.

This method can return one of the following:

Scalar: when the Series.agg method is called with a single function.
Series: when the code calls the DataFrame.agg method and uses a single function.
DataFrame: when the DataFrame.agg method is called with several functions.

The syntax for this method is as follows:

DataFrame.agg(func=None, axis=0, *args, **kwargs)

Parameter	Description
`func`	This parameter is a function used to aggregate data.
`axis`	If zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row.
`args`	This parameter is optional.
`**kwargs`	This parameter is keyword arguments passed to `func`.

For this example, Rivers Clothing needs to determine its highest and lowest priced items.

df = pd.DataFrame({'Tops':     [10.12, 12.23, 13.95],
                   'Tanks':   [11.35, 13.45, 14.98],
                   'Pants':   [21.37, 56.99, 94.87],
                   'Sweats': [27.15, 21.85, 35.75]})

pd.options.display.float_format = '${:.2f}'.format

result = df.agg([min, max])                  
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df.
Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
Line [3] retrieves the min and max prices. This output saves to the results variable.
Line [4] outputs the result to the terminal.

Output

Upon reviewing the DataFrame and the output below, size Small has the lowest price, and size Large has the highest price.

	Tops	Tanks	Pants	Sweats
min	$10.12	$11.35	$21.37	$21.85
max	$13.95	$14.98	$94.87	$35.75