­Pandas DataFrame Computations and Descriptive Stats

The Pandas DataFrame has several methods concerning Computations and Descriptive Stats. When applied to a DataFrame, these methods evaluate the elements and return the results.

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()

Getting Started

Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

Required Starter Code

import pandas as pd
import numpy as np 

Before any data manipulation can occur, two new libraries will require installation.

  • The pandas library enables access to/from a DataFrame.
  • The numpy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

Feel free to check out the correct ways of installing those libraries here:

If the installations were successful, a message displays in the terminal indicating the same.

DataFrame abs()

The abs() method converts each element in a DataFrame containing a negative value to a positive (absolute) value. This method has no parameters. Another option aside from the abs() method is to use numpy.absolute().

The syntax for this method is as follows:

DataFrame.abs()

For this example, the Sales Manager of Rivers Clothing noticed that some of their inventory contained negative pricing. To resolve this issue, the Sales Manager ran the following code.

Code – Example 1

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':   [44, 43, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':  [88, 38, 13]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = df_inv.abs()
print(result)
  • Line [1-4] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [6-7] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [9] uses the abs() method to convert negative values to positive (absolute) values. The output saves to the result variable.
  • Line [10] outputs the result to the terminal.

Output:

 TopsTanksPantsSweats
Small36446188
Medium23433338
Large19206713

This example is similar to the above. However, it calls numpy.absolute() to change negative values to positive (absolute) values.

Code – Example 2

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':   [44, 43, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':  [88, 38, 13]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = np.absolute(df_inv)
print(result)
  • Line [1-4] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [6-7] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [9] uses np.absolute() to convert any negative values to positive (absolute) values. The output saves to the result variable.
  • Line [10] outputs the result to the terminal. The output is identical to the example above.

DataFrame all()

The all() method determines if all elements over a specified axis resolve to True.

The syntax for this method is as follows:

DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
ParametersDescription
axisIf zero (0) or index is selected, apply the function to each column. Default is None.
If one (1) is selected, apply the function to each row.
bool_onlyIncludes only Boolean DataFrame columns. If None, this parameter will attempt to use everything. Not supported for Series.
skipnaThis parameter excludes NaN/NULL values.
If the row/column is NaN and skipna=True, the result is True. For an empty row/column and skipna=False, then NaN is treated as True because they are not equal to 0.
levelIf the axis is MultiLevel, count along with a specific level and, collapsing into a Series.
**kwargsAdditional keywords have no effect.

For this example, the Rivers Clothing Warehouse Manager needs to find out what is happening with the inventory for Tanks. Something is amiss!

Code – Example 1

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [0, 0, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

result = df_inv.Tanks.all(skipna=False)
print(result)
  • Line [1-4] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [6] checks all elements of Tanks and saves True/False to the result variable.
  • Line [7] outputs the result to the terminal.

Output:

False

In the above example, we used Tanks. However, you can reference each DataFrame column by using all().

Code – Example 2

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [0, 0, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

result = df_inv.all()
print(result)

Output:

TopsTrue
TanksFalse
PantsTrue
SweatsTrue
dtype: bool

DataFrame any()

The any() method evaluates each element to determine if the value is True/False on a specified axis. This method returns True if a DataFrame axis is Non-Zero or Non-Empty, else False returns.

The syntax for this method is as follows:

DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
ParametersDescription
axisIf zero (0) or index is selected, apply the function to each column. Default is None.
If one (1) is selected, apply the function to each row.
bool_onlyIncludes only Boolean DataFrame columns. If None, this parameter will attempt to use everything. Not supported for Series.
skipnaThis parameter excludes NaN/NULL values.
If the row/column is NaN and skipna=True, the result is True. For an empty row/column and skipna=False, then NaN is treated as True because they are not equal to 0.
levelIf the axis is MultiLevel, count along with a specific level and, collapsing into a Series.
**kwargsAdditional keywords have no effect.

For this example, Rivers Clothing assumes every item in their inventory contains a valid value. To confirm this, run the following code.

df_inv = pd.DataFrame({'Tops':     [36, 23, 0],
                       'Tanks':    [10, 20, 0],
                       'Pants':    [61, 33, 0],
                       'Sweats':  [88, 38, 0]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = df_inv.any(axis='columns')
print(result)
  • Line [1-4] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [6-7] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [9] checks all elements of the DataFrame based on the specified axis and saves to the result variable.
  • Line [10] outputs the result to the terminal.

Output:

It appears there is an issue with the Large size of all items in inventory.

SmallTrue
MediumTrue
LargeFalse
dtype: bool

DataFrame clip()

The clip() method assigns values outside the boundary to boundary values. Thresholds can be singular values or array-like, and in the latter case, the clipping is performed element-wise in the specified axis.

The syntax for this method is as follows:

DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
ParameterDescription
lowerThis parameter is the minimum threshold value. By default, the value is None.
upperThis parameter is the maximum threshold value. By default, the value is None.
axisIf zero (0) or index is selected, apply the function to each column. Default is None.
If one (1) is selected, apply the function to each row.
inplaceThis parameter aligns the object with lower and upper along the specified axis.
*args
**kwargsAdditional keywords have no effect.

For this example, Rivers Clothing is having a sale on Pants in sizes Medium and Large. These pant prices are greater than $25.00.

df_prices = pd.DataFrame({'Tops':  [10.22, 12.45, 17.45],
                          'Tanks':   [9.99, 10.99, 11.99],
                          'Pants':   [24.95, 26.95, 32.95],
                          'Sweats':  [18.99, 19.99, 21.99]})

index_ = ['Small', 'Medium', 'Large']
df_prices.index = index_

result = df_inv.clip(10, 25, axis='rows')
print(result)
  • Line [1-4] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [6-7] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [9] checks each element for the lower and upper limits and updates accordingly. The output saves to the result variable.
  • Line [10] outputs the result to the terminal.

Output:

 TopsTanksPantsSweats
Small10.2210.0024.9518.99
Medium12.4510.9925.0019.99
Large17.4511.9925.0021.99

DataFrame corr()

The corr() method computes pair-wise correlation of columns. This does not include NaN and NULL values.

The syntax for this method is as follows:

DataFrame.corr(method='pearson', min_periods=1)
ParameterDescription
method The possible correlation methods are:
'pearson': standard correlation coefficient. By default, Pearson.
'kendall': Kendall Tau correlation coefficient.
'spearman': Spearman rank correlation.
– Callable with two (2) 1D ndarrays and returns a float.
min_periodsMinimum number of observations required per pair of columns to have a valid result. This option is only available for the Pearson and Spearman correlations.
df_prices = pd.DataFrame({'Tops':  [10.22, 12.45, 17.45],
                          'Tanks':   [9.99, 10.99, 11.99],
                          'Pants':   [24.95, 26.95, 32.95],
                          'Sweats':  [18.99, 19.99, 21.99]})

result = df_prices.corr()
print(result)
  • Line [1-4] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [6] applies the correlation method. The output saves to the result variable.
  • Line [7] outputs the result to the terminal.

Output:

 TopsTanksPantsSweats
Tops1.0000000.9763980.9979950.999620
Tanks0.9763981.0000000.9607690.981981
Pants0.9979950.960769 1.0000000.995871
Sweats0.9996200.9819810.9958711.000000

DataFrame corrwith()

The corrwith() method computes the pair-wise correlation of columns. Click here to view a detailed article from the Finxter Academy on this method.