NumPy is a popular Python library for data science focusing on arrays, vectors, and matrices. It’s at the core of data science and machine learning in Python. In today’s article, you’ll going to master NumPy’s impressive
average() function that will be a loyal friend to you when fighting your upcoming data science battles.
average(a, axis=None, weights=None, returned=False)
|array-like: The array contains the data to be averaged. Can be multi-dimensional and it doesn’t have to be a NumPy array—but usually, it is.|
|None or int or tuple of ints: The axis along which to average the array |
|array-like: An array of weights associated to the values in the array |
|Boolean: If |
Here’s a short summary of the
NumPy’s average function computes the average of all numerical values in a NumPy array. When used without parameters, it simply calculates the numerical average of all values in the array, no matter the array’s dimensionality. For example, the expression
np.average([[1,2],[2,3]]) results in the average value
(1+2+2+3)/4 = 2.0.
How to Calculate the Weighted Average of a Numpy Array in Python?
However, what if you want to calculate the weighted average of a NumPy array? In other words, you want to overweight some array values and underweight others.
You can easily accomplish this with NumPy’s average function by passing the weights argument to the NumPy
import numpy as np a = [-1, 1, 2, 2] print(np.average(a)) # 1.0 print(np.average(a, weights = [1, 1, 1, 5])) # 1.5
In the first example, we simply averaged over all array values:
(-1+1+2+2)/4 = 1.0. However, in the second example, we overweight the last array element 2—it now carries five times the weight of the other elements resulting in the following computation:
(-1+1+2+(2+2+2+2+2))/8 = 1.5.
How to Average Along an Axis?
Extracting basic statistics from matrices (e.g. average, variance, standard deviation) is a critical component for analyzing a wide range of data sets such as financial data, health data, or social media data. With the rise of machine learning and data science, your proficient education of linear algebra operators with NumPy becomes more and more valuable to the marketplace
In the following, you’ll learn how to average along an axis. Here’s what you want to achieve:
Here is how you can average along an axis in NumPy:
import numpy as np x = np.array([[1, 3, 5], [1, 1, 1], [0, 2, 4]]) print(np.average(x, axis=1)) # [3. 1. 2.]
NumPy internally represents data using NumPy arrays (
np.array). These arrays can have an arbitrary number of dimensions. In the figure above, we show a two-dimensional NumPy array.
In practice, the array can have much higher dimensionality. You can quickly identify the dimensionality of a NumPy array by counting the number of opening brackets “
[“ when creating the array. The more formal alternative would be to use the
Each dimension has its own axis identifier. As a rule of thumb: the outermost dimension has the identifier “0”, the second-outermost dimension has the identifier “1”, and so on.
By default, the NumPy average function aggregate all the values in a NumPy array to a single value:
import numpy as np x = np.array([[1, 3, 5], [1, 1, 1], [0, 2, 4]]) print(np.average(x)) # 2.0
For example, the simple average of a NumPy array is calculated as follows:
(1+3+5+1+1+1+0+2+4)/9 = 18/9 = 2.0
Calculating Average, Variance, Standard Deviation Along an Axis
However, sometimes you want to average along an axis.
For example, you may work at a large financial corporation and want to calculate the average value of a stock price — given a large matrix of stock prices (rows = different stocks, columns = daily stock prices).
Here is how you can do this by specifying the keyword “
axis” as an argument to the average function:
import numpy as np ## Stock Price Data: 5 companies # (row=[price_day_1, price_day_2, ...]) x = np.array([[8, 9, 11, 12], [1, 2, 2, 1], [2, 8, 9, 9], [9, 6, 6, 3], [3, 3, 3, 3]]) avg = np.average(x, axis=1) print("Averages: " + str(avg)) """ Averages: [10. 1.5 7. 6. 3. ] """
Note that you want to perform the function along the
axis=1, i.e., this is the axis that is aggregated to a single value. Hence, the resulting NumPy arrays have a reduced dimensionality.
High-dimensional Averaging Along An Axis
Of course, you can also perform this averaging along an axis for high-dimensional NumPy arrays. Conceptually, you’ll always aggregate the axis you specify as an argument.
Here is an example:
import numpy as np x = np.array([[[1,2], [1,1]], [[1,1], [2,1]], [[1,0], [0,0]]]) print(np.average(x, axis=2)) """ [[1.5 1. ] [1. 1.5] [0.5 0. ]] """
NumPy Average Puzzle
Puzzles are a great way to test and train your coding skills. Have a look at the following puzzle:
import numpy as np # Goals in five matches goals_brazil = np.array([1,2,3,1,2]) goals_germany = np.array([1,0,1,2,0]) br = np.average(goals_brazil) ge = np.average(goals_germany) print(br>ge)
Exercise: What is the output of this puzzle?
You can solve this puzzle on the interactive Finxter puzzle app:
This puzzle introduces one new feature of the NumPy library: the average function. When applied to a 1D array, this function returns the average value of the array.
In the puzzle, the average of the goals of the last five games of Brazil is 1.8 and of Germany is 0.8. On average, Brazil shot one more goal per game.
Are you a master coder?
Test your skills now!
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.