Problem Formulation: How to calculate the weighted average of the elements in a NumPy array?
Definition weighted average: Each array element has an associated weight. The weighted average is the sum of all array elements, properly weighted, divided by the sum of all weights.
Here’s the problem exemplified:
Quick solution: Before we discuss the solution in great detail, here’s the solution that solves this exact problem:
import numpy as np array = np.array([[1, 0, 2], [1, 1, 1]]) weights = np.array([[2, 1, 1], [1, 1, 2]]) print(np.average(array, weights=weights)) # 1.0
Want to learn how it works—and how you can average along an axis as well? Let’s dive deeper into the problem next!
Weighted Average with NumPy’s np.average() Function
np.average(arr) function computes the average of all numerical values in a NumPy array. When used with only one array argument, it calculates the numerical average of all values in the array, no matter the array’s dimensionality. For example, the expression
np.average([[1,2],[2,3]]) results in the average value
(1+2+2+3)/4 = 2.0.
However, what if you want to calculate the weighted average of a NumPy array? In other words, you want to overweight some array values and underweight others.
import numpy as np a = [-1, 1, 2, 2] print(np.average(a)) # 1.0 print(np.average(a, weights = [1, 1, 1, 5])) # 1.5
In the first example, we simply averaged over all array values:
(-1+1+2+2)/4 = 1.0. However, in the second example, we overweight the last array element 2—it now carries five times the weight of the other elements resulting in the following computation:
(-1+1+2+(2+2+2+2+2))/8 = 1.5.
NumPy Weighted Average Video
NumPy Average Syntax
Let’s explore the different parameters we can pass to
- The NumPy array which can be multi-dimensional.
- (Optional) The axis along which you want to average. If you don’t specify the argument, the averaging is done over the whole array.
- (Optional) The weights of each column of the specified axis. If you don’t specify the argument, the weights are assumed to be homogeneous.
- (Optional) The return value of the function. Only if you set this to True, you will get a tuple (average, weights_sum) as a result. This may help you to normalize the output. In most cases, you can skip this argument.
average(a, axis=None, weights=None, returned=False)
|array-like: The array contains the data to be averaged. Can be multi-dimensional and it doesn’t have to be a NumPy array—but usually, it is.|
|None or int or tuple of ints: The axis along which to average the array |
|array-like: An array of weights associated to the values in the array |
|Boolean: If |
NumPy Weighted Average Along an Axis (Puzzle)
Here is an example how to average along the columns of a 2D NumPy array with specified weights for both rows.
import numpy as np # daily stock prices # [morning, midday, evening] solar_x = np.array( [[2, 3, 4], # today [2, 2, 5]]) # yesterday # midday - weighted average print(np.average(solar_x, axis=0, weights=[3/4, 1/4]))
What is the output of this puzzle?
*Beginner Level* (solution below)
You can also solve this puzzle in our puzzle-based learning app (100% FREE): Test your skills now!
Numpy is a popular Python library for data science focusing on arrays, vectors, and matrices.
This puzzle introduces the average function from the NumPy library. When applied to a 1D NumPy array, this function returns the average of the array values. When applied to a 2D NumPy array, it simply flattens the array. The result is the average of the flattened 1D array.
In the puzzle, we have a matrix with two rows and three columns. The matrix gives the stock prices of the
solar_x stock. Each row represents the prices for one day. The first column specifies the morning price, the second the midday price, and the third the evening price.
Now suppose, we do not want to know the average of the flattened matrix but the average of the price in the midday. Moreover, we want to overweight the most recent stock price. Today accounts for three-quarters and yesterday for one-quarter of the final average value.
NumPy enables this via the
weights parameter in combination with the
weightsparameter defines the weight for each value participating in the average calculation.
axisparameter specifies the direction along which the average should be calculated.
In a 2D matrix, the row is specified as
axis=0 and the column as
axis=1. We want to know three average values, for the morning, midday, and evening. We calculate the average along the row, i.e.,
axis=0. This results in three average values. Now we take the second element to get the midday variance.
Where to Go From Here?
Want to thrive in data science? Master NumPy first. You need to know the most important concepts (such as the axis argument) before you dive into machine learning and data science. Only then, you can properly understand the algorithms and build your career on a solid foundation.
To help you accomplish this, we’ve written an easy-to-read, fun introduction into the NumPy library. It’s 100% based on puzzle-based learning: you solve rated NumPy puzzles, test your skills, and improve over time. Check it out—it’s fun! 🙂
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.