The *Pandas DataFrame* has several methods concerning **Computations** and **Descriptive Stats**. When applied to a *DataFrame*, these methods evaluate the elements and return the results.

**Part 1**focuses on the DataFrame methods`abs()`

,`all()`

,`any()`

,`clip()`

,`corr()`

, and`corrwith()`

.**Part 2**focuses on the DataFrame methods`count()`

,`cov()`

,`cummax()`

,`cummin()`

,`cumprod()`

,`cumsum()`

.**Part 3**focuses on the DataFrame methods`describe()`

,`diff()`

,`eval()`

,`kurtosis()`

.**Part 4**focuses on the DataFrame methods`mad()`

,`min()`

,`max()`

,`mean()`

,`median()`

, and`mode()`

.**Part 5**focuses on the DataFrame methods`pct_change()`

,`quantile()`

,`rank()`

,`round()`

,`prod()`

, and`product()`

.**Part 6**focuses on the DataFrame methods`add_prefix()`

,`add_suffix()`

, and`align()`

.**Part 7**focuses on the DataFrame methods`at_time()`

,`between_time()`

,`drop()`

,`drop_duplicates()`

and`duplicated()`

.**Part 8**focuses on the DataFrame methods`equals()`

,`filter()`

,`first()`

,`last(), head()`

, and`tail()`

Table of Contents

## Getting Started

Remember to add the * Required Starter Code* to the top of each code snippet. This snippet will allow the code in this article to run error-free.

**Required Starter Code**

import pandas as pd import numpy as np

Before any data manipulation can occur, two new libraries will require installation.

- The
`pandas`

library enables access to/from a*DataFrame*. - The
`numpy`

library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt (`$`

), execute the code below. For the terminal used in this example, the command prompt is a dollar sign (`$`

). Your terminal prompt may be different.

$ pip install pandas

Hit the `<Enter>`

key on the keyboard to start the installation process.

$ pip install numpy

Hit the `<Enter>`

key on the keyboard to start the installation process.

**Feel free to check out the correct ways of installing those libraries here:**

If the installations were successful, a message displays in the terminal indicating the same.

## DataFrame pct_change()

The `pct_change()`

method calculates and returns the percentage change between the current and prior element(s) in a DataFrame. The return value is the caller.

To fully understand this method and other methods in this tutorial from a mathematical point of view, feel free to watch this short tutorial:

The syntax for this method is as follows:

DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)

Parameter | Description |
---|---|

`periods` | This sets the period(s) to calculate the percentage change. |

`fill_method` | This determines what value `NaN` contains. |

`limit` | This sets how many `NaN` values to fill in the DataFrame before stopping. |

`freq` | Used for a specified time series. |

`**kwargs` | Additional keywords passed into a DataFrame/Series. |

This example calculates and returns the percentage change of four (4) fictitious stocks over three (3) months.

df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87], 'DBL': [39.91, 41.46, 40.99], 'UXL': [44.01, 43.67, 41.98]}, index= ['2021-10-01', '2021-11-01', '2021-12-01']) result = df.pct_change(axis='rows', periods=1) print(result)

- Line [1] creates a
*DataFrame*from a dictionary of lists and saves it to`df`

. - Line [2] uses the
`pc_change()`

method with a selected axis and period to calculate the change. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

ASL | DBL | UXL | |

2021-10-01 | NaN | NaN | NaN |

2021-11-01 | -0.100370 | 0.038837 | -0.007726 |

2021-12-01 | -0.126835 | -0.011336 | -0.038699 |

π‘ **Note**: The first line contains `NaN`

values as there is no previous row.

## DataFrame quantile()

The `quantile()`

method returns the values from a DataFrame/Series at the specified quantile and axis.

The syntax for this method is as follows:

DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')

Parameter | Description |
---|---|

`q` | This is a value `0 <= q <= 1` and is the quantile(s) to calculate. |

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`interpolation` | Calculates the estimated median or quartiles for the DataFrame/Series. |

To fully understand the `interpolation`

parameter from a mathematical point of view, feel free to check out this tutorial:

This example uses the same stock DataFrame as noted above to determine the quantile(s).

df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87], 'DBL': [39.91, 41.46, 40.99], 'UXL': [44.01, 43.67, 41.98]}) result = df.quantile(0.15) print(result)

- Line [1] creates a
*DataFrame*from a dictionary of lists and saves it to`df`

. - Line [2] uses the
`quantile()`

method to calculate by setting the`q`

(quantile) parameter to 0.15. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

ASL | 15.518 |

DBL | 40.234 |

USL | 42.487 |

Name: 0.15, dtype: float64 |

## DataFrame rank()

The `rank()`

method returns a DataFrame/Series with the values ranked in order. The return value is the same as the caller.

The syntax for this method is as follows:

DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`method` | Determines how to rank identical values, such as: – The average rank of the group. – The lowest (min) rank value of the group. – The highest (max) rank value of the group. – Each assigns in the same order they appear in the array. – Density increases by one (1) between the groups. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`na_option` | Determines how `NaN` values rank, such as: – Keep assigns a NaN to the rank values. – Top: The lowest rank to any NaN values found. – Bottom: The highest to any NaN values found. |

`ascending` | Determines if the elements/values rank in ascending or descending order. |

`pct` | If set to `True` , the results will return in percentile form. By default, this value is `False` . |

For this example, a CSV file is read in and is ranked on Population and sorted. Click here to download and move this file to the current working directory.

df = pd.read_csv("countries.csv") df["Rank"] = df["Population"].rank() df.sort_values("Population", inplace=True) print(df)

- Line [1] reads in the
`countries.csv`

file and saves it to`df`

. - Line [2] appends a column to the end of the DataFrame (
`df`

). - Line [3] sorts the CSV file in ascending order.
- Line [4] outputs the result to the terminal.

**Output:**

Country | Capital | Population | Area | Rank | |

4 | Poland | Warsaw | 38383000 | 312685 | 1.0 |

2 | Spain | Madrid | 47431256 | 498511 | 2.0 |

3 | Italy | Rome | 60317116 | 301338 | 3.0 |

1 | France | Paris | 67081000 | 551695 | 4.0 |

0 | Germany | Berlin | 83783942 | 357021 | 5.0 |

5 | Russia | Moscow | 146748590 | 17098246 | 6.0 |

6 | USA | Washington | 328239523 | 9833520 | 7.0 |

8 | India | Dheli | 1352642280 | 3287263 | 8.0 |

7 | China | Beijing | 1400050000 | 9596961 | 9.0 |

## DataFrame round()

The `round()`

method rounds the DataFrame output to a specified number of decimal places.

The syntax for this method is as follows:

DataFrame.round(decimals=0, *args, **kwargs)

Parameter | Description |
---|---|

`decimals` | Determines the specified number of decimal places to round the value(s). |

`*args` | Additional keywords passed into a DataFrame/Series. |

`**kwargs` | Additional keywords passed into a DataFrame/Series. |

For this example, the Bank of Canadaβs mortgage rates over three (3) months display and round to three (3) decimal places.

**Code Example 1:**

df = pd.DataFrame([(2.3455, 1.7487, 2.198)], columns=['Month 1', 'Month 2', 'Month 3']) result = df.round(3) print(result)

- Line [1] creates a
*DataFrame*complete with column names and saves to`df`

. - Line [2] rounds the mortgage rates to three (3) decimal places. This output saves to the
`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Month 1 | Month 2 | Month 3 | |

0 | 2.346 | 1.749 | 2.198 |

Another way to perform the same task is with a Lambda!

**Code Example 2:**

df = pd.DataFrame([(2.3455, 1.7487, 2.198)], columns=['Month 1', 'Month 2', 'Month 3']) result = df.apply(lambda x: round(x, 3)) print(result)

- Line [1] creates a
*DataFrame*complete with column names and saves to`df`

. - Line [2] rounds the mortgage rates to three (3) decimal places using a Lambda. This output saves to the
`result`

variable. - Line [3] outputs the result to the terminal.

π‘ **Note**: The output is identical to that of the above.

## DataFrame prod() and product()

The `prod()`

and `product()`

methods are identical. Both return the product of the values of a requested axis.

The syntax for these methods is as follows:

DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If set to `True` , this parameter excludes NaN/NULL values when calculating the result. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`min_count` | The number of values on which to perform the calculation. |

`**kwargs` | Additional keywords passed into a DataFrame/Series. |

For this example, random numbers generate and the product on the selected axis returns.

df = pd.DataFrame({'A': [2, 4, 6], 'B': [7, 3, 5], 'C': [6, 3, 1]}) index_ = ['A', 'B', 'C'] df.index = index_ result = df.prod(axis=0) print(result)

- Line [1] creates a
*DataFrame*complete with random numbers and saves to`df`

. - Line [2-3] creates and sets the DataFrame index.
- Line [3] calculates the product along axis 0. This output saves to the
`result`

variable. - Line [4] outputs the result to the terminal.

**Output:**

**Formula Example:** 2*4*6=48

A | 48 |

B | 105 |

C | 18 |

dtype: int64 |

At university, I found my love of writing and coding. Both of which I was able to use in my career.

During the past 15 years, I have held a number of positions such as:

- Corporate Technical Writer for various software programs such as Navision and Microsoft CRM
- Corporate Trainer (staff of 30+)
- Programming Teacher at a College
- Implementation Specialist for Navision and Microsoft CRM
- Senior PHP coder writing custom software online programs