Want to calculate the variance of a column in your Pandas DataFrame?

In case you’ve attended your last statistics course a few years ago, let’s quickly recap the **definition of variance**: it’s the *average squared deviation of the list elements from the average value.*

You can calculate the variance of a Pandas DataFrame by using the `pd.var()`

function that calculates the variance along all columns. You can then get the column you’re interested in after the computation.

import pandas as pd # Create your Pandas DataFrame d = {'username': ['Alice', 'Bob', 'Carl'], 'age': [18, 22, 43], 'income': [100000, 98000, 111000]} df = pd.DataFrame(d) print(df)

Your DataFrame looks like this:

username | age | income | |

0 | Alice | 18 | 100000 |

1 | Bob | 22 | 98000 |

2 | Carl | 43 | 111000 |

Here’s how you can calculate the variance of all columns:

print(df.var())

The output is the variance of all columns:

age 1.803333e+02 income 4.900000e+07 dtype: float64

To get the variance of an individual column, access it using simple indexing:

print(df.var()['age']) # 180.33333333333334

Together, the code looks as follows. Use the interactive shell to play with it!

## Where to Go From Here?

Before you can become a data science master, you first need to master Python. Join my free Python email course and receive your daily Python lesson directly in your INBOX. It’s fun!

Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]