The *Pandas DataFrame* has several methods concerning **Computations** and **Descriptive Stats**. When applied to a *DataFrame*, these methods evaluate the elements and return the results.

**Part 1**focuses on the DataFrame methods`abs()`

,`all()`

,`any()`

,`clip()`

,`corr()`

, and`corrwith()`

.**Part 2**focuses on the DataFrame methods`count()`

,`cov()`

,`cummax()`

,`cummin()`

,`cumprod()`

,`cumsum()`

.**Part 3**focuses on the DataFrame methods`describe()`

,`diff()`

,`eval()`

,`kurtosis()`

.**Part 4**focuses on the DataFrame methods`mad()`

,`min()`

,`max()`

,`mean()`

,`median()`

, and`mode()`

.**Part 5**focuses on the DataFrame methods`pct_change()`

,`quantile()`

,`rank()`

,`round()`

,`prod()`

, and`product()`

.**Part 6**focuses on the DataFrame methods`add_prefix()`

,`add_suffix()`

, and`align()`

.**Part 7**focuses on the DataFrame methods`at_time()`

,`between_time()`

,`drop()`

,`drop_duplicates()`

and`duplicated()`

.**Part 8**focuses on the DataFrame methods`equals()`

,`filter()`

,`first()`

,`last(), head()`

, and`tail()`

Table of Contents

## Getting Started

Remember to add the * Required Starter Code* to the top of each code snippet. This snippet will allow the code in this article to run error-free.

**Required Starter Code**

import pandas as pd import numpy as np

Before any data manipulation can occur, two new libraries will require installation.

- The
`pandas`

library enables access to/from a*DataFrame*. - The
`numpy`

library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt (`$`

), execute the code below. For the terminal used in this example, the command prompt is a dollar sign (`$`

). Your terminal prompt may be different.

$ pip install pandas

Hit the `<Enter>`

key on the keyboard to start the installation process.

$ pip install numpy

Hit the `<Enter>`

key on the keyboard to start the installation process.

**Feel free to check out the correct ways of installing those libraries here:**

If the installations were successful, a message displays in the terminal indicating the same.

## DataFrame mad()

The `mad()`

method (* Mean Absolute Deviation*) is the average distance of all DataFrame elements from the mean.

To fully understand MAD from a mathematical point of view, feel free to watch this short tutorial:

The syntax for this method is as follows:

DataFrame.mad(axis=None, skipna=None, level=None)

Parameter | Description |
---|---|

`axis` | If zero (0) or index is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If this parameter is `True` , any `NaN` /NULL value(s) ignored. If `False` , all value(s) included: valid or empty. If no value, then `None` is assumed. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

This example retrieves the MAD of four (4) Hockey Teams.

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 6, 10], 'Leafs': [2, 7, 11], 'Flames': [1, 8, 12]}) result = df_teams.mad(axis=0).apply(lambda x:round(x,3)) print(result)

- Line [1] creates a
*DataFrame*from a Dictionary of Lists and saves it to`df_teams`

. - Line [2] uses the
`mad()`

method with the`axis`

parameter set to columns to calculate MAD from the DataFrame. The lambda function formats the output to three (3) decimal places. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | 2.000 |

Oilers | 2.444 |

Leafs | 3.111 |

Flames | 4.000 |

dtype: | float64 |

## DataFrame min()

The `min()`

method returns the smallest value(s) from a DataFrame/Series. The following methods can accomplish this task:

- The
`DataFrame.min()`

method, or - The
`numpy.minimum()`

method

The syntax for this method is as follows:

DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameter | Description |
---|---|

`axis` | If zero (0) or index is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If this parameter is `True` , any `NaN` /NULL value(s) ignored. If `False` , all value(s) included: valid or empty. If no value, then `None` is assumed. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`**kwargs` | This is where you can add additional keywords. |

For this example, we will determine which Team(s) have the smallest amounts of wins, losses, or ties.

**Code Example 1**:

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 6, 14], 'Leafs': [2, 7, 11], 'Flames': [21, 8, 7]}) result = df_teams.min(axis=0) print(result)

- Line [1] creates a
*DataFrame*from a dictionary of lists and saves it to`df_teams`

. - Line [2] uses the
`min()`

method with the axis parameter set to columns to retrieve the minimum value(s) from the DataFrame. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | 4 |

Oilers | 3 |

Leafs | 2 |

Flames | 8 |

dtype: | int64 |

This example uses two (2) arrays and retrieves the minimum value(s) of the Series.

**Code Example 2:**

c11_grades = [63, 78, 83, 93] c12_grades = [73, 84, 79, 83] result = np.minimum(c11_grades, c12_grades) print(result)

- Line [1-2] create lists of random grades and assigns them to the appropriate variable.
- Line [3] uses NumPy minimum to compare the two (2) arrays. This output saves to the
`result`

variable. - Line [4] outputs the result to the terminal.

**Output:**

[63 78 79 83]

## DataFrame max()

The `max()`

method returns the largest value(s) from a DataFrame/Series. The following methods can accomplish this task:

- The
`DataFrame.max()`

method, or - The
`n`

`p`

`.maximum()`

method

The syntax for this method is as follows:

DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If this parameter is `True` , any `NaN` /NULL value(s) ignored. If `False` , all value(s) included: valid or empty. If no value, then `None` is assumed. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`**kwargs` | This is where you can add additional keywords. |

For this example, we will determine which Team(s) have the largest amounts of wins, losses, or ties.

**Code Example 1:**

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 6, 14], 'Leafs': [2, 7, 11], 'Flames': [21, 8, 7]}) result = df_teams.max(axis=0) print(result)

- Line [1] creates a
*DataFrame*from a Dictionary of Lists and saves it to`df_teams`

. - Line [2] uses
`max()`

with the`axis`

parameter set to columns to retrieve the maximum value(s) from the DataFrame. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | 9 |

Oilers | 14 |

Leafs | 11 |

Flames | 21 |

dtype: | int64 |

This example uses two (2) arrays and retrieves the maximum value(s) of the Series.

**Code Example 2:**

c11_grades = [63, 78, 83, 93] c12_grades = [73, 84, 79, 83] result = np.maximum(c11_grades, c12_grades) print(result)

- Line [1-2] create lists of random grades and assigns them to the appropriate variable.
- Line [3] uses the NumPy library maximum function to compare the two (2) arrays. This output saves to the
`result`

variable. - Line [4] outputs the result to the terminal.

**Output:**

[73 84 83 93]

## DataFrame mean()

The `mean()`

method returns the average of the DataFrame/Series across a requested axis. If a DataFrame is used, the results will return a Series. If a Series is used, the result will return a single number (float).

The following methods can accomplish this task:

- The
`DataFrame.mean()`

method, or - The
`Series.mean()`

method

The syntax for this method is as follows:

DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If this parameter is `True` , any `NaN` /NULL value(s) ignored. If `False` , all value(s) included: valid or empty. If no value, then `None` is assumed. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`**kwargs` | This is where you can add additional keywords. |

For this example, we will determine average wins, losses and ties for our Hockey Teams.

**Code Example 1:**

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 6, 14], 'Leafs': [2, 7, 11], 'Flames': [21, 8, 7]}) result = df_teams.mean(axis=0).apply(lambda x:round(x,2)) print(result)

- Line [1] creates a
*DataFrame*from a Dictionary of Lists and saves it to`df_teams`

. - Line [2] uses the
`mean()`

method with the`axis`

parameter set to columns to calculate means (averages) from the DataFrame. The lambda function formats the output to two (2) decimal places. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | 6.00 |

Oilers | 7.67 |

Leafs | 6.67 |

Flames | 12.00 |

dtype: | float64 |

For this example, Alice Accord, an employee of Rivers Clothing has logged her hours for the week. Letβs calculate the mean (average) hours worked per day.

**Code Example 2:**

hours = pd.Series([40.5, 37.5, 40, 55]) result = hours.mean() print(result)

- Line [1] creates a Series of hours worked for the week and saves to hours.
- Line [2] uses the
`mean()`

method to calculate the mean (average). This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

42.25

## DataFrame median()

The `median()`

method calculates and returns the median of DataFrame/Series elements across a requested axis. In other words, the median determines the middle number(s) of the dataset.

To fully understand median from a mathematical point of view, watch this short tutorial:

The syntax for this method is as follows:

DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`skipna` | If this parameter is `True` , any `NaN` /NULL value(s) ignored. If `False` , all value(s) included: valid or empty. If no value, then `None` is assumed. |

`level` | Set the appropriate parameter if the DataFrame/Series is multi-level. If no value, then `None` is assumed. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`**kwargs` | This is where you can add additional keywords. |

For this example, we will determine the median value(2) for our Hockey Teams.

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 6, 14], 'Leafs': [2, 7, 11], 'Flames': [21, 8, 7]}) result = df_teams.median(axis=0) print(result)

- Line [1] creates a
*DataFrame*from a dictionary of lists and saves it to`df_teams`

. - Line [2] uses the
`median()`

method to calculate the median of the Teams. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | 5.0 |

Oilers | 6.0 |

Leafs | 7.0 |

Flames | 8.0 |

dtype: | float64 |

## DataFrame mode()

The `mode()`

method determines the most commonly used numbers in a DataFrame/Series.

The syntax for this method is as follows:

DataFrame.mode(axis=0, numeric_only=False, dropna=True)

Parameter | Description |
---|---|

`axis` | If zero (0) or `index` is selected, apply the function to each column. Default is `None` . If one (1) is selected, apply the function to each row. |

`numeric_only` | Only include columns that contain integers, floats, or boolean values. |

`dropna` | If set to `True` , this parameter ignores all `NaN` and `NaT` values. By default, this value is True. |

For this example, we determine the numbers that appear more than once.

df_teams = pd.DataFrame({'Bruins': [4, 5, 9], 'Oilers': [3, 9, 13], 'Leafs': [2, 7, 4], 'Flames': [13, 9, 7]}) result = df_teams.mode(axis=0) print(result)

- Line [1] creates a
*DataFrame*from a Dictionary of Lists and saves it to`df_teams`

. - Line [2] uses the
`mode()`

method across the column`axis`

. This output saves to the`result`

variable. - Line [3] outputs the result to the terminal.

**Output:**

Bruins | Oilers | Leafs | Flames | |

0 | 4 | 3 | 2 | 7 |

1 | 5 | 9 | 4 | 9 |

2 | 9 | 13 | 7 | 13 |

You can see where the numbers come from in this visualization:

At university, I found my love of writing and coding. Both of which I was able to use in my career.

During the past 15 years, I have held a number of positions such as:

- Corporate Technical Writer for various software programs such as Navision and Microsoft CRM
- Corporate Trainer (staff of 30+)
- Programming Teacher at a College
- Implementation Specialist for Navision and Microsoft CRM
- Senior PHP coder writing custom software online programs