5 Best Ways to Calculate Mean Deviation in Python - Be on the Right Side of Change

💡 Problem Formulation: Calculating the mean deviation involves assessing the average distance between each data point in a set and the mean of that set. Specifically, this article aims to address ways to compute the mean deviation of elements in a Python list, like [10, 12, 23, 23, 16, 23, 21, 16], with the desired output being a single numerical value representing this mean deviation.

Method 1: Using Pure Python

This method entails iterating through the list of numbers, calculating the mean, and then the average absolute difference from that mean. This is a fundamental approach without the need for any external libraries.

Here’s an example:


def mean_deviation(data):
    mean = sum(data) / len(data)
    deviations = [abs(x - mean) for x in data]
    mean_dev = sum(deviations) / len(data)
    return mean_dev

data_set = [10, 12, 23, 23, 16, 23, 21, 16]
print(mean_deviation(data_set))

Output: 4.875

This code defines a function mean_deviation() that computes the mean of a provided list, calculates each element’s absolute deviation from the mean, and finally returns the average of these deviations as the mean deviation of the list.

Method 2: Using NumPy Library

NumPy, a powerful library for numerical computations in Python, provides efficient and optimized functions for mean and absolute deviations. Employing NumPy can greatly simplify the computation of mean deviation.

Here’s an example:


import numpy as np

data_set = np.array([10, 12, 23, 23, 16, 23, 21, 16])
mean = np.mean(data_set)
mean_dev = np.mean(np.abs(data_set - mean))
print(mean_dev)

Output: 4.875

In this snippet, we use NumPy’s mean() function to find the mean and then apply vectorized subtraction to compute the deviations. We again use mean() on the absolute deviations to find the mean deviation.

Method 3: Using pandas Library

pandas is a widely-used data manipulation library that makes handling data easier. If the data is already in a pandas DataFrame or Series, using pandas to calculate the mean deviation is quite convenient.

Here’s an example:


import pandas as pd

data_series = pd.Series([10, 12, 23, 23, 16, 23, 21, 16])
mean_dev = data_series.mad()
print(mean_dev)

Output: 4.875

This snippet leverages pandas’ built-in method mad(), which stands for “mean absolute deviation” from the mean. It’s a straightforward and efficient way to achieve our goal with a pandas data structure.

Method 4: Using SciPy Library

SciPy is another scientific library that extends NumPy and provides a host of statistical tools. Using SciPy for the mean deviation is useful when working with statistical data and is part of a larger analytical workflow.

Here’s an example:


from scipy import stats

data_set = [10, 12, 23, 23, 16, 23, 21, 16]
mean_dev = stats.mean_absolute_deviation(data_set)
print(mean_dev)

Output: 4.875

The code uses SciPy’s mean_absolute_deviation() function to directly compute the mean deviation. It’s an excellent choice when you’re already using SciPy for other statistical analyses.

Bonus One-Liner Method 5: Using List Comprehension and Built-in Functions

This approach is a compact one-liner that uses Python’s built-in functions and list comprehension to calculate the mean deviation without explicitly defining a function.

Here’s an example:


data_set = [10, 12, 23, 23, 16, 23, 21, 16]
mean_dev = sum(abs(x - (sum(data_set) / len(data_set))) for x in data_set) / len(data_set)
print(mean_dev)

Output: 4.875

This one-liner computes the mean deviation by using a generator expression within the sum() function to calculate the deviations, and then divides by the length of the dataset to find the mean deviation.

Summary/Discussion

Method 1: Pure Python. Independent of any libraries, it’s universally applicable but can be less efficient with large data sets. Good for educational purposes and small-scale applications.
Method 2: NumPy Library. It’s fast and efficient, particularly suitable for large numerical datasets. The downside is the requirement of the external NumPy library.
Method 3: pandas Library. Perfect for data analysis workflows already using pandas, and it’s user-friendly. However, it might be overkill for simple operations if pandas isn’t already in use.
Method 4: SciPy Library. Ideal for comprehensive statistical analysis, but similar to pandas, it can be excessive for single-step calculations.
Method 5: One-Liner. Quick and easy for small data sets or for quick calculations without needing a function. Can be less readable and more difficult to debug with more complex computations.