The Pandas DataFrame has several Function Applications, GroupBy & Window methods. When applied to a DataFrame, these methods modify the output of a DataFrame.
Part 1 of this series focuses on Function Applications and delves into each of the following methods.
Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter>
key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import numpy as np
DataFrame apply()
The apply()
method accepts a function across an axis of a DataFrame. This method returns a Series or DataFrame along the specified axis as determined by the axis
parameter set below.
The syntax for this method is as follows:
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
Parameter | Description |
---|---|
func | This parameter is a function applied to either row(s) or column(s). This parameter depends on the axis selected. |
axis | If zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row. |
raw | This determines if a row or column uses a Series or ndarray .False passes each row/column as a Series to the function. By default, False .True uses the ndarray . To receive better results, use this option. |
result_type | This parameter applies when the axis parameter equals 1 (Column).Expand is list-like objects converted to columns. Reduce returns a Series rather than a list-like result: the opposite of Expand. Broadcast: results sent to the original DataFrame shape. Index and columns remain the same. By default, None . |
args | The positional argument to pass to the function. By default, a Tuple |
**kwargs | Additional keyword arguments to pass as keywords arguments. |
Rivers Clothing has completed a market analysis of its product pricing. They have determined that pricing on Tops and Tanks falls well below the profit margins of their other lines. Use the apply()
method with the lambda function to update these prices.
Code β Example 1
df = pd.DataFrame({'Tops': [10.12, 12.23, 13.95], 'Tanks': [11.35, 13.45, 14.98], 'Pants': [21.37, 56.99, 94.87], 'Sweats': [27.15, 21.85, 35.75]}) pd.options.display.float_format = '${:.2f}'.format index_ = ['Small', 'Medium', 'Large'] df.index = index_ result = df.apply(lambda x: x*2 if x.name in ['Tops', 'Tanks'] else x) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df
. - Line [2] formats the output with a dollar sign ($) and two (2) decimal places.
- Line [3-4] creates and sets the index for the DataFrame (Small/Medium/Large).
- Line [5] uses the
apply()
method with alambda
. This line multiplies each element in Tops and Tanks by two (2). The output saves to theresult
variable. Other pricing remains unchanged. - Line [6] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Small | $20.24 | $22.70 | $21.37 | $27.15 |
Medium | $24.46 | $26.90 | $56.99 | $21.85 |
Large | $27.90 | $29.96 | $94.87 | $35.75 |
This example uses the apply()
method and np.sum
. This code calculates the sum of all amounts held in Inventory based on product type.
Code β Example 2
df = pd.DataFrame({'Tops': [10.12, 12.23, 13.95], 'Tanks': [11.35, 13.45, 14.98], 'Pants': [21.37, 56.99, 94.87], 'Sweats': [27.15, 21.85, 35.75]}) pd.options.display.float_format = '${:.2f}'.format result = df.apply(np.sum, axis=0) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df
. - Line [2] formats the output with a dollar sign (
$
) and two (2) decimal places. - Line [3] uses
apply()
withnp.sum
and sums the product prices along the Column axis. The output saves to theresult
variable. - Line [4] outputs the result to the terminal.
Output
Tops | $36.30 |
Tanks | $39.78 |
Pants | $173.23 |
Sweats | $84.75 |
dtype: float64 |
DataFrame applymap()
The applymap()
method applies a function element-wise to a DataFrame. This method returns a transformed DataFrame.
The syntax for this method is as follows:
DataFrame.applymap(func, na_action=None, **kwargs)
Parameter | Description |
---|---|
func | This parameter is callable and returns a single value from a single value. |
na_action | The options are: None /Ignore . Ignore : Propagates NaN values and does not pass to func . By default, None . |
**kwargs | Additional keyword arguments to pass as keywords arguments to the function. |
For this example, any item priced at 13.45 has an 'M'
appended to the end. This initial indicates that the item price needs to be adjusted. The M stands for Modify.
df = pd.DataFrame({'Tops': [10.12, 12.23, 13.95], 'Tanks': [11.35, 13.45, 14.98], 'Pants': [21.37, 56.99, 94.87], 'Sweats': [27.15, 21.85, 35.75]}) pd.options.display.float_format = '${:.2f}'.format index_ = ['Small', 'Medium', 'Large'] df.index = index_ result = df.applymap(lambda x: str(x) + 'M' if x == 13 else x) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df
. - Line [2] formats the output with a dollar sign (
$
) and two (2) decimal places. - Line [3-4] creates and sets the index for the DataFrame.
- Line [5] uses
applymap()
with a lambda to search for the price of13.45
. If found, an'M'
appends to the end. The output saves to theresult
variable. - Line [6] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Small | $10.12 | $11.35 | $21.37 | $27.15 |
Medium | $12.23 | 13.45M | $56.99 | $21.85 |
Large | $13.95 | $14.98 | $94.87 | $35.75 |
DataFrame pipe()
The pipe()
method takes a function and will apply it to each element of the DataFrame or a subset thereof.
The syntax for this method is as follows:
DataFrame.pipe(func, *args, **kwargs)
Parameter | Description |
---|---|
func | Applies to a Series/DataFrame. Arguments args and **kwargs are passed to the function. |
args | This parameter is an iterable and is optional and passed to func . |
**kwargs | This parameter is a Dictionary of keyword arguments passed into func . |
Rivers Clothing realized the pricing for Pants is a little too high and needs adjusting. The pipe
method with a custom function is perfect for performing this price adjustment!
df = pd.DataFrame({'Tops': [10.12, 12.23, 13.95], 'Tanks': [11.35, 13.45, 14.98], 'Pants': [21.37, 56.99, 94.87], 'Sweats': [27.15, 21.85, 35.75]}) pd.options.display.float_format = '${:.2f}'.format index_ = ['Small', 'Medium', 'Large'] df.index = index_ def change_price(x): x['Pants'] = [21.50, 36.95, 55.72] return x result = df.pipe(change_price) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df
. - Line [2] formats the output with a dollar sign (
$
) and two (2) decimal places. - Line [3-4] creates and sets the index for the DataFrame.
- Line [5-7] creates the
change_price
function. This function changes the price for each item in the Pants category. - Line [8] calls the
change_price
function and saves the output to the results variable. - Line [9] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Small | $10.12 | $11.35 | $21.50 | $27.15 |
Medium | $12.23 | $13.45 | $36.95 | $21.85 |
Large | $13.95 | $14.98 | $55.71 | $35.75 |
DataFrame agg() & aggregate()
The DataFrame agg()
and aggregate()
methods are identical. Both functions apply an aggregation across single or multiple columns.
This method can return one of the following:
- Scalar: when the
Series.agg
method is called with a single function. - Series: when the code calls the
DataFrame.agg
method and uses a single function. - DataFrame: when the
DataFrame.agg
method is called with several functions.
The syntax for this method is as follows:
DataFrame.agg(func=None, axis=0, *args, **kwargs)
Parameter | Description |
---|---|
func | This parameter is a function used to aggregate data. |
axis | If zero (0) or index is selected, apply to each column. Default is 0 (column). If zero (1) or columns, apply to each row. |
args | This parameter is optional. |
**kwargs | This parameter is keyword arguments passed to func . |
For this example, Rivers Clothing needs to determine its highest and lowest priced items.
df = pd.DataFrame({'Tops': [10.12, 12.23, 13.95], 'Tanks': [11.35, 13.45, 14.98], 'Pants': [21.37, 56.99, 94.87], 'Sweats': [27.15, 21.85, 35.75]}) pd.options.display.float_format = '${:.2f}'.format result = df.agg([min, max]) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df
. - Line [2] formats the output with a dollar sign (
$
) and two (2) decimal places. - Line [3] retrieves the min and max prices. This output saves to the
results
variable. - Line [4] outputs the result to the terminal.
Output
Upon reviewing the DataFrame and the output below, size Small has the lowest price, and size Large has the highest price.
Tops | Tanks | Pants | Sweats | |
min | $10.12 | $11.35 | $21.37 | $21.85 |
max | $13.95 | $14.98 | $94.87 | $35.75 |