Want to calculate the variance of a column in your Pandas DataFrame?
In case you’ve attended your last statistics course a few years ago, let’s quickly recap the definition of variance: it’s the average squared deviation of the list elements from the average value.


You can calculate the variance of a Pandas DataFrame by using the pd.var()
function that calculates the variance along all columns. You can then get the column you’re interested in after the computation.
import pandas as pd # Create your Pandas DataFrame d = {'username': ['Alice', 'Bob', 'Carl'], 'age': [18, 22, 43], 'income': [100000, 98000, 111000]} df = pd.DataFrame(d) print(df)
Your DataFrame looks like this:
username | age | income | |
0 | Alice | 18 | 100000 |
1 | Bob | 22 | 98000 |
2 | Carl | 43 | 111000 |
Here’s how you can calculate the variance of all columns:
print(df.var())
The output is the variance of all columns:
age 1.803333e+02 income 4.900000e+07 dtype: float64
To get the variance of an individual column, access it using simple indexing:
print(df.var()['age']) # 180.33333333333334
Together, the code looks as follows. Use the interactive shell to play with it!
Where to Go From Here?
Before you can become a data science master, you first need to master Python. Join my free Python email course and receive your daily Python lesson directly in your INBOX. It’s fun!
Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]