The Pandas DataFrame has several methods concerning Computations and Descriptive Stats. When applied to a DataFrame, these methods evaluate the elements and return the results.
Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter>
key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import numpy as np
DataFrame abs()
The abs()
method converts each element in a DataFrame containing a negative value to a positive (absolute) value. This method has no parameters. Another option aside from the abs()
method is to use numpy.absolute()
.
The syntax for this method is as follows:
DataFrame.abs()
For this example, the Sales Manager of Rivers Clothing noticed that some of their inventory contained negative pricing. To resolve this issue, the Sales Manager ran the following code.
Code β Example 1
df_inv = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [44, 43, -20], 'Pants': [61, -33, 67], 'Sweats': [88, 38, 13]}) index_ = ['Small', 'Medium', 'Large'] df_inv.index = index_ result = df_inv.abs() print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df_inv
. - Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
- Line [4] uses the
abs()
method to convert negative values to positive (absolute) values. The output saves to theresult
variable. - Line [5] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Small | 36 | 44 | 61 | 88 |
Medium | 23 | 43 | 33 | 38 |
Large | 19 | 20 | 67 | 13 |
This example is similar to the above. However, it calls numpy.absolute()
to change negative values to positive (absolute) values. The output remains the same.
Code β Example 2
df_inv = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [44, 43, -20], 'Pants': [61, -33, 67], 'Sweats': [88, 38, 13]}) index_ = ['Small', 'Medium', 'Large'] df_inv.index = index_ result = np.absolute(df_inv) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df_inv
. - Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
- Line [4] uses
np.absolute()
to convert any negative values to positive (absolute) values. The output saves to theresult
variable. - Line [5] outputs the result to the terminal. The output is identical to the example above.
DataFrame all()
The all()
method determines if all elements over a specified axis resolve to True
.
The syntax for this method is as follows:
DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
Parameters | Description |
---|---|
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
bool_only | Includes only Boolean DataFrame columns. If None , this parameter will attempt to use everything. Not supported for Series. |
skipna | This parameter excludes NaN/NULL values. If the row/column is NaN and skipna=True , the result is True . For an empty row/column and skipna=False , then NaN is treated as True because they are not equal to 0. |
level | If the axis is MultiLevel , count along with a specific level and collapse into a Series. |
**kwargs | Additional keywords have no effect. |
For this example, the Rivers Clothing Warehouse Manager needs to find out what is happening with the inventory for Tanks. Something is amiss!
Code β Example 1
df_inv = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [0, 0, -20], 'Pants': [61, -33, 67], 'Sweats': [88, 38, 13]}) result = df_inv.Tanks.all(skipna=False) print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df_inv
. - Line [2] checks all elements of Tanks and saves
True
/False
to theresult
variable. - Line [3] outputs the result to the terminal.
Output
False
In the above example, we used Tanks. However, you can reference each DataFrame column by using all()
.
Code β Example 2
df_inv = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [0, 0, -20], 'Pants': [61, -33, 67], 'Sweats': [88, 38, 13]}) result = df_inv.all() print(result)
Output
Tops | True |
Tanks | False |
Pants | True |
Sweats | True |
dtype: bool |
DataFrame any()
The any()
method evaluates each element to determine if the value is True
/False
on a specified axis
. This method returns True
if a DataFrame axis is Non-Zero or Non-Empty, else False
returns.
The syntax for this method is as follows:
DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
Parameters | Description |
---|---|
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
bool_only | Includes only Boolean DataFrame columns. If None , this parameter will attempt to use everything. Not supported for Series. |
skipna | This parameter excludes NaN/NULL values. If the row/column is NaN and skipna=True , the result is True . For an empty row/column and skipna=False , then NaN is treated as True because they are not equal to 0. |
level | If the axis is MultiLevel , count along with a specific level and collapse into a Series. |
**kwargs | Additional keywords have no effect. |
For this example, Rivers Clothing assumes every item in their inventory contains a valid value. To confirm this, run the following code.
df_inv = pd.DataFrame({'Tops': [36, 23, 0], 'Tanks': [10, 20, 0], 'Pants': [61, 33, 0], 'Sweats': [88, 38, 0]}) index_ = ['Small', 'Medium', 'Large'] df_inv.index = index_ result = df_inv.any(axis='columns') print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df_inv
. - Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
- Line [4] checks all elements of the DataFrame based on the specified axis and saves to the result variable.
- Line [5] outputs the result to the terminal.
Output
There is an issue with the Large size of all items in inventory. They all contain zero values.
Small | True |
Medium | True |
Large | False |
dtype: bool |
DataFrame clip()
The clip()
method assigns values outside the boundary to boundary values. Thresholds can be singular values or array-like, and in the latter case, the clipping is performed element-wise in the specified axis.
The syntax for this method is as follows:
DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
Parameter | Description |
---|---|
lower | This parameter is the minimum threshold value. By default, the value is None . |
upper | This parameter is the maximum threshold value. By default, the value is None . |
axis | If zero (0) or index is selected, apply to each column. Default 0. If one (1) apply to each row. |
inplace | This parameter aligns the object with lower and upper along the specified axis. |
*args | – |
**kwargs | Additional keywords have no effect. |
For this example, Rivers Clothing is having a sale on Pants in sizes Medium and Large. Unfortunately, these prices are greater than the sale price of $25.00 and need to be modified.
df_prices = pd.DataFrame({'Tops': [10.22, 12.45, 17.45], 'Tanks': [9.99, 10.99, 11.99], 'Pants': [24.95, 26.95, 32.95], 'Sweats': [18.99, 19.99, 21.99]}) index_ = ['Small', 'Medium', 'Large'] df_prices.index = index_ result = df_inv.clip(10, 25, axis='rows') print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df_inv
. - Line [2-3] creates and sets the index for the DataFrame (Small/Medium/Large).
- Line [4] checks each element for the lower and upper limits and updates accordingly. The output saves to the
result
variable. - Line [5] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Small | 10.22 | 10.00 | 24.95 | 18.99 |
Medium | 12.45 | 10.99 | 25.00 | 19.99 |
Large | 17.45 | 11.99 | 25.00 | 21.99 |
DataFrame corr()
The corr()
method computes pair-wise correlation of columns. This does not include NaN
and NULL values.
The syntax for this method is as follows:
DataFrame.corr(method='pearson', min_periods=1)
Parameter | Description |
---|---|
method | The possible correlation methods are: – 'pearson' : standard correlation coefficient. By default, Pearson.– 'kendall' : Kendall Tau correlation coefficient.– 'spearman' : Spearman rank correlation.– Callable with two (2) 1D ndarrays and returns a float. |
min_periods | The minimum number of observations required per pair of columns to have a valid result. This option is only available for the Pearson and Spearman correlations. |
df_prices = pd.DataFrame({'Tops': [10.22, 12.45, 17.45], 'Tanks': [9.99, 10.99, 11.99], 'Pants': [24.95, 26.95, 32.95], 'Sweats': [18.99, 19.99, 21.99]}) result = df_prices.corr() print(result)
- Line [1] creates a DataFrame from a Dictionary of Lists and saves it to
df_inv
. - Line [2] applies the correlation method. The output saves to the
result
variable. - Line [3] outputs the result to the terminal.
Output
Tops | Tanks | Pants | Sweats | |
Tops | 1.000000 | 0.976398 | 0.997995 | 0.999620 |
Tanks | 0.976398 | 1.000000 | 0.960769 | 0.981981 |
Pants | 0.997995 | 0.960769 | 1.000000 | 0.995871 |
Sweats | 0.999620 | 0.981981 | 0.995871 | 1.000000 |
DataFrame corrwith()
The corrwith()
method computes the pair-wise correlation of columns. Click here to view a detailed article from the Finxter Academy on this method.
Further Learning Resources
This is Part 1 of the DataFrame method series.
Also, have a look at the Pandas DataFrame methods cheat sheet!