How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas?

Want to calculate the standard deviation of a column in your Pandas DataFrame?

In case you’ve attended your last statistics course a few years ago, let’s quickly recap the definition of variance: it’s the average squared deviation of the list elements from the average value.

You can do this by using the pd.std() function that calculates the standard deviation along all columns. You can then get the column you’re interested in after the computation.

import pandas as pd

# Create your Pandas DataFrame
d = {'username': ['Alice', 'Bob', 'Carl'],
     'age': [18, 22, 43],
     'income': [100000, 98000, 111000]}
df = pd.DataFrame(d)

print(df)

Your DataFrame looks like this:

	username	age	income
0	Alice	18	100000
1	Bob	22	98000
2	Carl	43	111000

Here’s how you can calculate the standard deviation of all columns:

print(df.std())

The output is the standard deviation of all columns:

age         13.428825
income    7000.000000
dtype: float64

To get the variance of an individual column, access it using simple indexing:

print(df.std()['age'])
# 180.33333333333334

Together, the code looks as follows. Use the interactive shell to play with it!

Standard Deviation in NumPy Library

Python’s package for data science computation NumPy also has great statistics functionality. You can calculate all basic statistics functions such as average, median, variance, and standard deviation on NumPy arrays. Simply import the NumPy library and use the np.var(a) method to calculate the average value of NumPy array a.

Here’s the code:

import numpy as np

a = np.array([1, 2, 3])
print(np.std(a))
# 0.816496580927726

Where to Go From Here?

Before you can become a data science master, you first need to master Python. Join my free Python email course and receive your daily Python lesson directly in your INBOX. It’s fun!

Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]