**π‘ Problem Formulation:** When working with data in Python, it’s often necessary to compute statistical metrics to understand the variability or dispersion within your dataset. For data analysis tasks, you may need to find the standard deviation for specific columns within a Pandas DataFrame. The standard deviation is a measure that quantifies the amount of variation or dispersion of a set of values. This article will describe how to determine the standard deviation for selected columns in a Pandas DataFrame, providing input in the form of a DataFrame and aiming for an output of standard deviation values for the specified columns.

## Method 1: Use `std()`

Function on DataFrame

The `std()`

function in Pandas computes the standard deviation of a DataFrame or specific columns within it. For a Series containing numeric data, the `std()`

function calculates the standard deviation of the elements. When applied to a DataFrame, you can specify the columns in which you are interested.

Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [4, 5, 6, 7], 'C': [7, 8, 9, 10]}) # Calculating standard deviation for specific columns std_deviation = df[['A', 'C']].std() print(std_deviation)

Output:

A 1.290994 C 1.290994 dtype: float64

This code snippet creates a simple DataFrame with three columns ‘A,’ ‘B,’ and ‘C.’ We use the `std()`

function to calculate the standard deviation of the specific columns ‘A’ and ‘C.’ The output shows the standard deviation for each of these columns.

## Method 2: Subset DataFrame Before Using `std()`

Another approach is to subset your DataFrame to include only the columns of interest and then apply the `std()`

function to the resulting DataFrame. This avoids accidentally including extra columns in your calculation.

Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [10, 11, 12, 13], 'B': [13, 14, 15, 16], 'C': [16, 17, 18, 19]}) # Subset DataFrame and calculate standard deviation selected_columns = df[['A', 'B']] std_deviation = selected_columns.std() print(std_deviation)

Output:

A 1.290994 B 1.290994 dtype: float64

By first creating a subset DataFrame that contains only columns ‘A’ and ‘B’, and then applying the `std()`

function, we get the standard deviations for just those columns without affecting or using data from column ‘C’.

## Method 3: Using `agg()`

Method to Compute Multiple Statistics

The `agg()`

method in Pandas allows you to apply one or more operations over the specified axis. For standard deviation, you can use `agg()`

combined with a dictionary to compute the standard deviation of specific columns.

Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'X': [20, 21, 19, 18], 'Y': [22, 23, 21, 20], 'Z': [25, 26, 24, 23]}) # Calculate standard deviation using the agg() method std_deviation = df.agg({'X': 'std', 'Z': 'std'}) print(std_deviation)

Output:

X 1.290994 Z 1.290994 dtype: float64

This code uses the `agg()`

method to define a dictionary that specifies the computation of standard deviation for columns ‘X’ and ‘Z’. The result is a Series with the standard deviation values for the selected columns.

## Method 4: Using `describe()`

Method for Descriptive Statistics

The `describe()`

method in Pandas provides descriptive statistics that summarize the central tendency, dispersion, and shape of a datasetβs distribution. While it returns multiple statistics, you can extract standard deviation from the result.

Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'Column1': [100, 110, 120, 130], 'Column2': [130, 140, 150, 160]}) # Using describe() to obtain descriptive statistics description = df.describe() # Extracting standard deviation std_deviation = description.loc['std', ['Column1', 'Column2']] print(std_deviation)

Output:

Column1 12.909944 Column2 12.909944 Name: std, dtype: float64

After getting the descriptive statistics with `describe()`

, we extract the ‘std’ row, which contains the standard deviation for the specified columns, ‘Column1’ and ‘Column2’.

## Bonus One-Liner Method 5: Use List Comprehension and `std()`

For a quick and concise calculation, use a list comprehension to apply the `std()`

function to a list of specified columns within the DataFrame.

Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'First': [2, 4, 6, 8], 'Second': [3, 6, 9, 12], 'Third': [4, 8, 12, 16]}) # Calculating standard deviation using list comprehension std_deviation = {column: df[column].std() for column in ['First', 'Third']} print(std_deviation)

Output:

{'First': 2.581988897471611, 'Third': 5.163977794943222}

The list comprehension iterates through the list of column names and calculates the standard deviation for each, creating a dictionary with the results.

## Summary/Discussion

**Method 1:**Direct Application of`std()`

. Great for simplicity and direct use cases. Limited to basic usage scenarios.**Method 2:**Subset before`std()`

. Offers more control over selected data. Requires an extra step of subsetting.**Method 3:**Use of`agg()`

. Flexible for computing multiple statistics. Might be overkill for a single operation.**Method 4:**`describe()`

Method. Provides a full overview of statistics. Inefficient if only standard deviation is needed.**Method 5:**List Comprehension. Quick and concise, ideal for one-liners. May become cumbersome with a large number of columns.