Pandas DataFrame describe() Method - Be on the Right Side of Change

Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required libraries.

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import numpy as np

DataFrame describe()

The describe() method analyzes numeric and object series and DataFrame column sets of various data types.

The syntax for this method is as follows (source):

DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Parameters	Description
`percentiles`	The percentiles to include in the output. All should be between 0-1. The default is `[.25, .5, .75]`which returns the 25th, 50th, and 75th percentiles. This parameter accepts a list-like numbers and is optional.
`include`	This parameter is a white list of data types to include. Ignored for Series. Below are the available options. – ‘all’: All input columns will be included in the output. – A list-like of dtypes: Limits the results to the provided data types. – To limit the result to numeric types, submit `numpy.numbe`r. – To limit it instead to object columns submit the `numpy.object` data type. – Strings can also be used in the style of `select_dtypes` (e.g. `df.describe(include=['O'])`). To select pandas categorical columns, use `'category'`
`exclude`	This parameter is a list of `dtypes`. This excludes the data type provided from the result. – To exclude numeric data types, submit a `numpy.number`. – To exclude object columns, submit the data type `numpy.object`. – Strings can also be used as `select_dtypes` (ex: `df.describe(include=['O']`). – To exclude pandas columns, use `'category'`.
`datetime_is_numeric`	This parameter determines if the datetimes are numeric. By default, this parameter is `False`.

Also, consider this table from the docs:

Numeric Data	For numeric data, the result’s index will include `count`, `mean`, `std`, `min`, `max` as well as lower, 50 and upper percentiles. By default, the lower percentile is 25, and the upper percentile is 75. The 50 percentile is the same as the `median`.
Object Data	For object data (strings or timestamps), the result’s index will include `count`, `unique`, `top`, and `freq`. The `top` is the most common value. The frequency (`freq`) is the most common value’s frequency. Timestamps also include the first and last items.
Multiple Object Values	If multiple object values have the highest count, then the `count` and `top` results will be arbitrarily chosen from among those with the highest count.
Mixed Data Types	For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the DataFrame consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If `include='all'` is provided as an option, the result will include a union of attributes of each type.
Include & Exclude	These parameters can limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

For this example, the same Teams DataFrame referred to in Part 2 of this series is used. The DataFrame below displays four (4) Hockey Teams’ stats: wins, losses, and ties.

df_teams = pd.DataFrame({'Bruins':   [4, 5, 9],
                         'Oilers':   [3, 6, 10],
                         'Leafs':    [2, 7, 11],
                         'Flames':   [1, 8, 12]})

result = df_teams.describe().apply(lambda x:round(x,2))
print(result)

Line [1] creates a DataFrame from a Dictionary of Lists and saves it to df_teams.
Line [2] uses the describe() method to retrieve additional analytical information. Using a lambda, it then formats the output to two (2) decimal places and saves it to the result variable.
Line [3] outputs the result to the terminal.

Output

	Bruins	Oilers	Leafs	Flames
count	3.00	3.00	3.00	3.00
mean	6.00	6.33	6.67	7.00
std	2.65	3.51	4.51	5.57
min	4.00	3.00	2.00	1.00
25%	4.50	4.50	4.50	4.50
50%	5.00	6.00	7.00	8.00
75%	7.00	8.00	9.00	#0.00
max	9.00	10.00	11.00	12.00

Click here to see additional examples.

More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

👈 df.cumsum()

df.diff() 👉

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.