**π‘ Problem Formulation:** When working with data in Python, Pandas DataFrames are a common structure to store tabular data. Often, a quick summary of the statistics for each column in a DataFrame helps provide insights. As a Python data analyst, you might have a DataFrame containing multiple rows and columns and wish to find a collective summary, such as count, mean, standard deviation, min, and max for each numerical column. This article explores five different ways to achieve this.

## Method 1: Using the `describe()`

Function

The `describe()`

function in Pandas is a convenient tool to get a quick overview of the statistical summaries for each numeric column in a DataFrame. It generates descriptive statistics that summarize the central tendency, dispersion, and shape of a datasetβs distribution, excluding NaN values. The function returns a DataFrame with summaries, including count, mean, std, min, max, and percentile values.

Here’s an example:

import pandas as pd # Sample DataFrame data = {'scores': [88, 92, 100, 85, 90, 87], 'time_spent': [43, 45, 50, 40, 42, 39]} df = pd.DataFrame(data) # Using describe() to summarize statistics summary = df.describe()

Output:

scores time_spent count 6.000000 6.000000 mean 90.333333 43.166667 std 5.206640 3.904949 min 85.000000 39.000000 25% 87.250000 40.500000 50% 89.000000 42.500000 75% 91.500000 44.750000 max 100.000000 50.000000

The `describe()`

function quickly provided us with a statistical summary table, where we can easily compare metrics like mean scores and time spent on a sample activity. It’s particularly useful for getting a broad picture of the data range and distribution without manually computing each statistic.

## Method 2: Using the `info()`

Function

The `info()`

function in Pandas is typically used to get a concise summary of a DataFrame. While it is not a statistical summary in the strict sense, it provides essential information like the number of non-null entries, data type of each column, and memory usage, which can be invaluable for preliminary data analysis.

Here’s an example:

# Using info() to get a summary of DataFrame info_summary = df.info()

Output:

```
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 scores 6 non-null int64
1 time_spent 6 non-null int64
dtypes: int64(2)
memory usage: 224.0 bytes
```

`The `

`info()`

function provides an immediate check for missing values and understanding the data type of each column. This can be very important before you start with any sort of data manipulation or statistical analysis.

`Method 3: Using Aggregate Functions`

`Aggregate functions in Pandas, such as `

`mean()`

, `std()`

, and `sum()`

, allow you to compute specific statistics for each column. You can apply multiple aggregate functions at once using the `agg()`

method to get a summary with selected statistics.

`Here's an example:`

```
# Applying multiple aggregate functions
aggregated_stats = df.agg(['mean', 'std', 'min', 'max'])
# Displaying the aggregated statistics
print(aggregated_stats)
```

`Output:`

```
scores time_spent
mean 90.333333 43.166667
std 5.206640 3.904949
min 85.000000 39.000000
max 100.000000 50.000000
```

`The `

`agg()`

method provides flexibility in selecting and calculating only the statistics that are relevant to your analysis, which helps to keep the summary focused and concise.

`Method 4: Using the ``value_counts()`

Function

`value_counts()`

Function`The `

`value_counts()`

function in Pandas is used for categorical data to count the frequency of each unique value in a column. This function can offer a different perspective on data by highlighting the distribution of categorical variables.

`Here's an example:`

```
categories = ['High', 'Medium', 'Low', 'Medium', 'High', 'Low']
df['performance'] = categories
# Using value_counts() to summarize the distribution of a categorical column
category_distribution = df['performance'].value_counts()
```

`Output:`

```
Medium 2
High 2
Low 2
Name: performance, dtype: int64
```

`This `

`value_counts()`

function is straightforward and very useful to understand the frequency and distribution of categorical data within a DataFrame, which is often required before applying more complex statistical techniques.

`Bonus One-Liner Method 5: Using the ``apply()`

Function

`apply()`

Function`The `

`apply()`

function in Pandas can be used to apply a function along an axis of the DataFrame. A quick one-liner to summarize statistics for each column could involve using apply with lambda functions.

`Here's an example:`

```
# Using apply to summarize statistics with a lambda function
one_liner_summary = df.apply(lambda x: {'mean': x.mean(), 'std': x.std()})
# Displaying the summary
print(one_liner_summary)
```

`Output:`

```
scores {'mean': 90.33333333333333, 'std': 5.206640...
time_spent {'mean': 43.166666666666664, 'std': 3.90494...
dtype: object
```

`This one-liner with `

`apply()`

offers a quick and customizable way to calculate and view a selection of statistics of interest across all columns.

`Summary/Discussion`

**Method 1:**`describe()`

. Provides a comprehensive statistical summary. May include more information than necessary for some purposes.**Method 2:**`info()`

. Useful for data type and non-null count overview. Does not provide statistical measures like mean, std, etc.**Method 3:**Aggregate Functions. Highly customizable. Must specify each statistic of interest.**Method 4:**`value_counts()`

. Ideal for summarizing categorical data distribution. Not for numerical statistics.**Method 5:**`apply()`

Function. Versatile and concise, perfect for customized statistics. More complex to read and understand compared to other methods.

` `