How to Calculate the Column Variance of a DataFrame in Python Pandas?

Rate this post

Want to calculate the variance of a column in your Pandas DataFrame?

In case you’ve attended your last statistics course a few years ago, let’s quickly recap the definition of variance: it’s the average squared deviation of the list elements from the average value.

You can calculate the variance of a Pandas DataFrame by using the pd.var() function that calculates the variance along all columns. You can then get the column you’re interested in after the computation.

import pandas as pd

# Create your Pandas DataFrame
d = {'username': ['Alice', 'Bob', 'Carl'],
     'age': [18, 22, 43],
     'income': [100000, 98000, 111000]}
df = pd.DataFrame(d)

print(df)

Your DataFrame looks like this:


usernameageincome
0Alice18100000
1Bob2298000
2Carl43111000

Here’s how you can calculate the variance of all columns:

print(df.var())

The output is the variance of all columns:

age       1.803333e+02
income    4.900000e+07
dtype: float64

To get the variance of an individual column, access it using simple indexing:

print(df.var()['age'])
# 180.33333333333334

Together, the code looks as follows. Use the interactive shell to play with it!

Where to Go From Here?

Before you can become a data science master, you first need to master Python. Join my free Python email course and receive your daily Python lesson directly in your INBOX. It’s fun!

Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]