# How to Calculate the Column Variance of a DataFrame in Python Pandas?

5/5 - (1 vote)

Want to calculate the variance of a column in your Pandas DataFrame?

In case you’ve attended your last statistics course a few years ago, let’s quickly recap the definition of variance: it’s the average squared deviation of the list elements from the average value.

You can calculate the variance of a Pandas DataFrame by using the `pd.var()` function that calculates the variance along all columns. You can then get the column you’re interested in after the computation.

```import pandas as pd

d = {'username': ['Alice', 'Bob', 'Carl'],
'age': [18, 22, 43],
'income': [100000, 98000, 111000]}
df = pd.DataFrame(d)

print(df)```

Here’s how you can calculate the variance of all columns:

`print(df.var())`

The output is the variance of all columns:

```age       1.803333e+02
income    4.900000e+07
dtype: float64```

To get the variance of an individual column, access it using simple indexing:

```print(df.var()['age'])
# 180.33333333333334```

Together, the code looks as follows. Use the interactive shell to play with it!

## Where to Go From Here?

Before you can become a data science master, you first need to master Python. Join my free Python email course and receive your daily Python lesson directly in your INBOX. It’s fun!

Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]