­Pandas DataFrame abs(), all(), any(), clip(), corr()

Rate this post

The Pandas DataFrame has several methods concerning Computations and Descriptive Stats. When applied to a DataFrame, these methods evaluate the elements and return the results.


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import numpy as np 

DataFrame abs()

The abs() method converts each element in a DataFrame containing a negative value to a positive (absolute) value. This method has no parameters. Another option aside from the abs() method is to use numpy.absolute().

The syntax for this method is as follows:

DataFrame.abs()

For this example, the Sales Manager of Rivers Clothing noticed that some of their inventory contained negative pricing. To resolve this issue, the Sales Manager ran the following code.

Code – Example 1

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [44, 43, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = df_inv.abs()
print(result)
  • Line [1] creates a DataFrame from a dictionary of lists and saves it to df_inv.
  • Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [4] uses the abs() method to convert negative values to positive (absolute) values. The output saves to the result variable.
  • Line [5] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Small36446188
Medium23433338
Large19206713

This example is similar to the above. However, it calls numpy.absolute() to change negative values to positive (absolute) values. The output remains the same.

Code – Example 2

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [44, 43, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = np.absolute(df_inv)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [4] uses np.absolute() to convert any negative values to positive (absolute) values. The output saves to the result variable.
  • Line [5] outputs the result to the terminal. The output is identical to the example above.

DataFrame all()

The all() method determines if all elements over a specified axis resolve to True.

The syntax for this method is as follows:

DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
ParametersDescription
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
bool_onlyIncludes only Boolean DataFrame columns. If None, this parameter will attempt to use everything. Not supported for Series.
skipnaThis parameter excludes NaN/NULL values.
If the row/column is NaN and skipna=True, the result is True. For an empty row/column and skipna=False, then NaN is treated as True because they are not equal to 0.
levelIf the axis is MultiLevel, count along with a specific level and collapse into a Series.
**kwargsAdditional keywords have no effect.

For this example, the Rivers Clothing Warehouse Manager needs to find out what is happening with the inventory for Tanks. Something is amiss!

Code – Example 1

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [0, 0, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

result = df_inv.Tanks.all(skipna=False)
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [2] checks all elements of Tanks and saves True/False to the result variable.
  • Line [3] outputs the result to the terminal.

Output

False

In the above example, we used Tanks. However, you can reference each DataFrame column by using all().

Code – Example 2

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [0, 0, -20],
                       'Pants':    [61, -33, 67],
                       'Sweats':   [88, 38, 13]})

result = df_inv.all()
print(result)

Output

TopsTrue
TanksFalse
PantsTrue
SweatsTrue
dtype: bool

DataFrame any()

The any() method evaluates each element to determine if the value is True/False on a specified axis. This method returns True if a DataFrame axis is Non-Zero or Non-Empty, else False returns.

The syntax for this method is as follows:

DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
ParametersDescription
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
bool_onlyIncludes only Boolean DataFrame columns. If None, this parameter will attempt to use everything. Not supported for Series.
skipnaThis parameter excludes NaN/NULL values.
If the row/column is NaN and skipna=True, the result is True. For an empty row/column and skipna=False, then NaN is treated as True because they are not equal to 0.
levelIf the axis is MultiLevel, count along with a specific level and collapse into a Series.
**kwargsAdditional keywords have no effect.

For this example, Rivers Clothing assumes every item in their inventory contains a valid value. To confirm this, run the following code.

df_inv = pd.DataFrame({'Tops':     [36, 23, 0],
                       'Tanks':    [10, 20, 0],
                       'Pants':    [61, 33, 0],
                       'Sweats':   [88, 38, 0]})

index_ = ['Small', 'Medium', 'Large']
df_inv.index = index_

result = df_inv.any(axis='columns')
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [4] checks all elements of the DataFrame based on the specified axis and saves to the result variable.
  • Line [5] outputs the result to the terminal.

Output

There is an issue with the Large size of all items in inventory. They all contain zero values.

SmallTrue
MediumTrue
LargeFalse
dtype: bool

DataFrame clip()

The clip() method assigns values outside the boundary to boundary values. Thresholds can be singular values or array-like, and in the latter case, the clipping is performed element-wise in the specified axis.

The syntax for this method is as follows:

DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
ParameterDescription
lowerThis parameter is the minimum threshold value. By default, the value is None.
upperThis parameter is the maximum threshold value. By default, the value is None.
axisIf zero (0) or index is selected, apply to each column. Default 0.
If one (1) apply to each row.
inplaceThis parameter aligns the object with lower and upper along the specified axis.
*args
**kwargsAdditional keywords have no effect.

For this example, Rivers Clothing is having a sale on Pants in sizes Medium and Large. Unfortunately, these prices are greater than the sale price of $25.00 and need to be modified.

df_prices = pd.DataFrame({'Tops':    [10.22, 12.45, 17.45],
                          'Tanks':   [9.99, 10.99, 11.99],
                          'Pants':   [24.95, 26.95, 32.95],
                          'Sweats':  [18.99, 19.99, 21.99]})

index_ = ['Small', 'Medium', 'Large']
df_prices.index = index_

result = df_inv.clip(10, 25, axis='rows')
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
  • Line [4] checks each element for the lower and upper limits and updates accordingly. The output saves to the result variable.
  • Line [5] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Small10.2210.0024.9518.99
Medium12.4510.9925.0019.99
Large17.4511.9925.0021.99

DataFrame corr()


The corr() method computes pair-wise correlation of columns. This does not include NaN and NULL values.

The syntax for this method is as follows:

DataFrame.corr(method='pearson', min_periods=1)
ParameterDescription
method The possible correlation methods are:
'pearson': standard correlation coefficient. By default, Pearson.
'kendall': Kendall Tau correlation coefficient.
'spearman': Spearman rank correlation.
– Callable with two (2) 1D ndarrays and returns a float.
min_periodsThe minimum number of observations required per pair of columns to have a valid result. This option is only available for the Pearson and Spearman correlations.
df_prices = pd.DataFrame({'Tops':    [10.22, 12.45, 17.45],
                          'Tanks':   [9.99, 10.99, 11.99],
                          'Pants':   [24.95, 26.95, 32.95],
                          'Sweats':  [18.99, 19.99, 21.99]})

result = df_prices.corr()
print(result)
  • Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_inv.
  • Line [2] applies the correlation method. The output saves to the result variable.
  • Line [3] outputs the result to the terminal.

Output

 TopsTanksPantsSweats
Tops1.0000000.9763980.9979950.999620
Tanks0.9763981.0000000.9607690.981981
Pants0.9979950.960769 1.0000000.995871
Sweats0.9996200.9819810.9958711.000000

DataFrame corrwith()

The corrwith() method computes the pair-wise correlation of columns. Click here to view a detailed article from the Finxter Academy on this method.

Further Learning Resources

This is Part 1 of the DataFrame method series.

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
  • Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
  • Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
  • Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
  • Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
  • Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
  • Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
  • Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
  • Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
  • Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
  • Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
  • Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
  • Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
  • Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
  • Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
  • Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().

Also, have a look at the Pandas DataFrame methods cheat sheet!