How to Get the Variance of a List in Python? - Be on the Right Side of Change

This article shows you how to calculate the variance of a given list of numerical inputs in Python.

In case you’ve attended your last statistics course a few years ago, let’s quickly recap the definition of variance: it’s the average squared deviation of the list elements from the average value.

So, how to calculate the variance of a given list in Python?

Python 3.x doesn’t have a built-in method to calculate the variance. Instead, use any of the following to methods:

With External Dependency: Import the NumPy library with import numpy as np and use the np.var(list) function.
Without External Dependency: Calculate the average as sum(list)/len(list) and then calculate the variance in a list comprehension statement.

Let’s have a look at both methods in Python code:

# 1. With External Dependency
import numpy as np
lst = [1, 2, 3]
var = np.var(lst)
print(var)
# 0.6666666666666666


# 2. W/O External Dependency
avg = sum(lst) / len(lst)
var = sum((x-avg)**2 for x in lst) / len(lst)
print(var)
# 0.6666666666666666

1. In the first example, you create the list and pass it as an argument to the np.var(lst) function of the NumPy library. Interestingly, the NumPy library also supports computations on basic collection types, not only on NumPy arrays. If you need to improve your NumPy skills, check out our in-depth blog tutorial.

2. In the second example, you first calculate the average as sum(list)/len(list). Then, you use a generator expression (see list comprehension) to dynamically generate a collection of individual squared differences, one per list element, by using the expression (x-avg)**2. You sum them up and normalize by the number of list elements to obtain the variance.

Both methods lead to the same output.

Puzzle: Try to modify the elements in the list so that the variance is 1.0 instead of 0.66666666666 in our interactive shell:

This is the absolute minimum you need to know about calculating basic statistics such as the variance in Python. But there’s far more to it and studying the other ways and alternatives will actually make you a better coder. So, let’s dive into some related questions and topics you may want to learn!

Variance in Python Pandas

Want to calculate the variance of a column in your Pandas DataFrame?

You can do this by using the pd.var() function that calculates the variance along all columns. You can then get the column you’re interested in after the computation.

import pandas as pd

# Create your Pandas DataFrame
d = {'username': ['Alice', 'Bob', 'Carl'],
     'age': [18, 22, 43],
     'income': [100000, 98000, 111000]}
df = pd.DataFrame(d)

print(df)

Your DataFrame looks like this:

	username	age	income
0	Alice	18	100000
1	Bob	22	98000
2	Carl	43	111000

Here’s how you can calculate the variance of all columns:

print(df.var())

The output is the variance of all columns:

age       1.803333e+02
income    4.900000e+07
dtype: float64

To get the variance of an individual column, access it using simple indexing:

print(df.var()['age'])
# 180.33333333333334

Together, the code looks as follows. Use the interactive shell to play with it!

Variance in NumPy

Python’s package for data science computation NumPy also has great statistics functionality. You can calculate all basic statistics functions such as average, median, variance, and standard deviation on NumPy arrays. Simply import the NumPy library and use the np.var(a) method to calculate the average value of NumPy array a.

Here’s the code:

import numpy as np

a = np.var([1, 2, 3])
print(np.average(a))
# 0.6666666666666

Python List Variance Without NumPy

Want to calculate the variance of a given list without using external dependencies?

Calculate the average as sum(list)/len(list) and then calculate the variance in a generator expression.

avg = sum(lst) / len(lst)
var = sum((x-avg)**2 for x in lst) / len(lst)
print(var)
# 0.6666666666666666

You first calculate the average as sum(list)/len(list).

Then, you use a generator expression (see list comprehension) to dynamically generate a collection of individual squared differences, one per list element, by using the expression (x-avg)**2.

You sum them up and normalize by the number of list elements to obtain the variance.

Python List Standard Deviation

Standard deviation is defined as the deviation of the data values from the average (wiki). It’s used to measure the dispersion of a data set. You can calculate the standard deviation of the values in the list by using the statistics module:

import statistics as s

lst = [1, 0, 4, 3]
print(s.stdev(lst))
# 1.8257418583505538

An alternative is to use NumPy’s np.std(lst) method.

Python List Median

What’s the median of a Python list? Formally, the median is “the value separating the higher half from the lower half of a data sample” (wiki).

How to calculate the median of a Python list?

Sort the list of elements using the sorted() built-in function in Python.
Calculate the index of the middle element (see graphic) by dividing the length of the list by 2 using integer division.
Return the middle element.

Together, you can simply get the median by executing the expression median = sorted(income)[len(income)//2].

Here’s the concrete code example:

income = [80000, 90000, 100000, 88000]

average = sum(income) / len(income)
median = sorted(income)[len(income)//2]

print(average)
# 89500.0

print(median)
# 90000.0

Related tutorials:

Detailed tutorial how to sort a list in Python on this blog.

Python List Mean

The mean value is exactly the same as the average value: sum up all values in your sequence and divide by the length of the sequence. You can use either the calculation sum(list) / len(list) or you can import the statistics module and call mean(list).

Here are both examples:

lst = [1, 4, 2, 3]

# method 1
average = sum(lst) / len(lst)
print(average)
# 2.5

# method 2
import statistics
print(statistics.mean(lst))
# 2.5

Both methods are equivalent. The statistics module has some more interesting variations of the mean() method (source):

mean()	Arithmetic mean (“average”) of data.
median()	Median (middle value) of data.
median_low()	Low median of data.
median_high()	High median of data.
median_grouped()	Median, or 50th percentile, of grouped data.
mode()	Mode (most common value) of discrete data.

These are especially interesting if you have two median values and you want to decide which one to take.

Python List Min Max

There are Python built-in functions that calculate the minimum and maximum of a given list. The min(list) method calculates the minimum value and the max(list) method calculates the maximum value in a list.

Here’s an example of the minimum, maximum and average computations on a Python list:

import statistics as s

lst = [1, 1, 2, 0]
average = sum(lst) / len(lst)
minimum = min(lst)
maximum = max(lst)

print(average)
# 1.0

print(minimum)
# 0

print(maximum)
# 2

Where to Go From Here

Summary: Python 3.x doesn’t have a built-in method to calculate the variance. Instead, use any of the following to methods:

With External Dependency: Import the NumPy library with import numpy as np and use the np.var(list) function.
Without External Dependency: Calculate the average as sum(list)/len(list) and then calculate the variance in a list comprehension statement.

If you keep struggling with those basic Python commands and you feel stuck in your learning progress, I’ve got something for you: Python One-Liners (Amazon Link).

In the book, I’ll give you a thorough overview of critical computer science topics such as machine learning, regular expression, data science, NumPy, and Python basics—all in a single line of Python code!

Get the book from Amazon!

OFFICIAL BOOK DESCRIPTION: Python One-Liners will show readers how to perform useful tasks with one line of Python code. Following a brief Python refresher, the book covers essential advanced topics like slicing, list comprehension, broadcasting, lambda functions, algorithms, regular expressions, neural networks, logistic regression and more. Each of the 50 book sections introduces a problem to solve, walks the reader through the skills necessary to solve that problem, then provides a concise one-liner Python solution with a detailed explanation.