Problem Formulation: How to calculate the weighted average of the elements in a NumPy array?
Definition weighted average: Each array element has an associated weight. The weighted average is the sum of all array elements, properly weighted, divided by the sum of all weights.
Here’s the problem exemplified:
Quick solution: Before we discuss the solution in great detail, here’s the solution that solves this exact problem:
import numpy as np array = np.array([[1, 0, 2], [1, 1, 1]]) weights = np.array([[2, 1, 1], [1, 1, 2]]) print(np.average(array, weights=weights)) # 1.0
Want to learn how it works—and how you can average along an axis as well? Let’s dive deeper into the problem next!
Weighted Average with NumPy’s np.average() Function
NumPy’s np.average(arr)
function computes the average of all numerical values in a NumPy array. When used with only one array argument, it calculates the numerical average of all values in the array, no matter the array’s dimensionality. For example, the expression np.average([[1,2],[2,3]])
results in the average value (1+2+2+3)/4 = 2.0
.
However, what if you want to calculate the weighted average of a NumPy array? In other words, you want to overweight some array values and underweight others.
You can easily accomplish this with NumPy’s average function by passing the weights argument to the NumPy average
function.
import numpy as np a = [-1, 1, 2, 2] print(np.average(a)) # 1.0 print(np.average(a, weights = [1, 1, 1, 5])) # 1.5
In the first example, we simply averaged over all array values: (-1+1+2+2)/4 = 1.0
. However, in the second example, we overweight the last array element 2—it now carries five times the weight of the other elements resulting in the following computation: (-1+1+2+(2+2+2+2+2))/8 = 1.5
.
NumPy Weighted Average Video
NumPy Average Syntax
Let’s explore the different parameters we can pass to np.average(...)
.
- The NumPy array which can be multi-dimensional.
- (Optional) The axis along which you want to average. If you don’t specify the argument, the averaging is done over the whole array.
- (Optional) The weights of each column of the specified axis. If you don’t specify the argument, the weights are assumed to be homogeneous.
- (Optional) The return value of the function. Only if you set this to True, you will get a tuple (average, weights_sum) as a result. This may help you to normalize the output. In most cases, you can skip this argument.
average(a, axis=None, weights=None, returned=False)
Argument | Description |
---|---|
a | array-like: The array contains the data to be averaged. Can be multi-dimensional and it doesnβt have to be a NumPy arrayβbut usually, it is. |
axis=None | None or int or tuple of ints: The axis along which to average the array a . |
weights=None | array-like: An array of weights associated to the values in the array a . This allows you to customize the weight towards the average of each element in the array. |
returned=False | Boolean: If False , returns the average value. If True , returns the tuple of the (average, sum_of_weights) so that you can easily normalize the weighted average. |
NumPy Weighted Average Along an Axis (Puzzle)
Here is an example how to average along the columns of a 2D NumPy array with specified weights for both rows.
import numpy as np # daily stock prices # [morning, midday, evening] solar_x = np.array( [[2, 3, 4], # today [2, 2, 5]]) # yesterday # midday - weighted average print(np.average(solar_x, axis=0, weights=[3/4, 1/4])[1])
What is the output of this puzzle?
*Beginner Level* (solution below)
You can also solve this puzzle in our puzzle-based learning app (100% FREE): Test your skills now!
Puzzle Explanation
Numpy is a popular Python library for data science focusing on arrays, vectors, and matrices.
This puzzle introduces the average function from the NumPy library. When applied to a 1D NumPy array, this function returns the average of the array values. When applied to a 2D NumPy array, it simply flattens the array. The result is the average of the flattened 1D array.
In the puzzle, we have a matrix with two rows and three columns. The matrix gives the stock prices of the solar_x
stock. Each row represents the prices for one day. The first column specifies the morning price, the second the midday price, and the third the evening price.
Now suppose, we do not want to know the average of the flattened matrix but the average of the price in the midday. Moreover, we want to overweight the most recent stock price. Today accounts for three-quarters and yesterday for one-quarter of the final average value.
NumPy enables this via the weights
parameter in combination with the axis
parameter.
- The
weights
parameter defines the weight for each value participating in the average calculation. - The
axis
parameter specifies the direction along which the average should be calculated.
In a 2D matrix, the row is specified as axis=0
and the column as axis=1
. We want to know three average values, for the morning, midday, and evening. We calculate the average along the row, i.e., axis=0
. This results in three average values. Now we take the second element to get the midday variance.
π Recommended: How to Normalize a NumPy Matrix
Where to Go From Here?
Want to thrive in data science? Master NumPy first. You need to know the most important concepts (such as the axis argument) before you dive into machine learning and data science. Only then, you can properly understand the algorithms and build your career on a solid foundation.
To help you accomplish this, we’ve written an easy-to-read, fun introduction into the NumPy library. It’s 100% based on puzzle-based learning: you solve rated NumPy puzzles, test your skills, and improve over time. Check it out—it’s fun! π
Solution
2.75