NumPy Average

NumPy is a popular Python library for data science focusing on arrays, vectors, and matrices. It’s at the core of data science and machine learning in Python. In today’s article, you’ll going to master NumPy’s impressive average() function that will be a loyal friend to you when fighting your upcoming data science battles.

average(a, axis=None, weights=None, returned=False)
aarray-like: The array contains the data to be averaged. Can be multi-dimensional and it doesn’t have to be a NumPy array—but usually, it is.
axis=NoneNone or int or tuple of ints: The axis along which to average the array a.
weights=Nonearray-like: An array of weights associated to the values in the array a. This allows you to customize the weight towards the average of each element in the array.
returned=FalseBoolean: If False, returns the average value. If True, returns the tuple of the (average, sum_of_weights) so that you can easily normalize the weighted average.

Here’s a short summary of the np.average() function:

NumPy’s average function computes the average of all numerical values in a NumPy array. When used without parameters, it simply calculates the numerical average of all values in the array, no matter the array’s dimensionality. For example, the expression np.average([[1,2],[2,3]]) results in the average value (1+2+2+3)/4 = 2.0.

How to Calculate the Weighted Average of a Numpy Array in Python?

However, what if you want to calculate the weighted average of a NumPy array? In other words, you want to overweight some array values and underweight others.

You can easily accomplish this with NumPy’s average function by passing the weights argument to the NumPy average function.

import numpy as np

a = [-1, 1, 2, 2]

# 1.0

print(np.average(a, weights = [1, 1, 1, 5]))
# 1.5

In the first example, we simply averaged over all array values: (-1+1+2+2)/4 = 1.0. However, in the second example, we overweight the last array element 2—it now carries five times the weight of the other elements resulting in the following computation: (-1+1+2+(2+2+2+2+2))/8 = 1.5.

How to Average Along an Axis?

Extracting basic statistics from matrices (e.g. average, variance, standard deviation) is a critical component for analyzing a wide range of data sets such as financial data, health data, or social media data. With the rise of machine learning and data science, your proficient education of linear algebra operators with NumPy becomes more and more valuable to the marketplace

In the following, you’ll learn how to average along an axis. Here’s what you want to achieve:

Here is how you can average along an axis in NumPy:

import numpy as np

x = np.array([[1, 3, 5],
              [1, 1, 1],
              [0, 2, 4]])

print(np.average(x, axis=1))
# [3. 1. 2.]

NumPy internally represents data using NumPy arrays (np.array). These arrays can have an arbitrary number of dimensions. In the figure above, we show a two-dimensional NumPy array.

In practice, the array can have much higher dimensionality. You can quickly identify the dimensionality of a NumPy array by counting the number of opening brackets “[“ when creating the array. The more formal alternative would be to use the ndim property.

Each dimension has its own axis identifier. As a rule of thumb: the outermost dimension has the identifier “0”, the second-outermost dimension has the identifier “1”, and so on.

By default, the NumPy average function aggregate all the values in a NumPy array to a single value:

import numpy as np

x = np.array([[1, 3, 5],
              [1, 1, 1],
              [0, 2, 4]])

# 2.0

For example, the simple average of a NumPy array is calculated as follows:

(1+3+5+1+1+1+0+2+4)/9 = 18/9 = 2.0

Calculating Average, Variance, Standard Deviation Along an Axis

However, sometimes you want to average along an axis.

For example, you may work at a large financial corporation and want to calculate the average value of a stock price — given a large matrix of stock prices (rows = different stocks, columns = daily stock prices).

Here is how you can do this by specifying the keyword “axis” as an argument to the average function:

import numpy as np

## Stock Price Data: 5 companies
# (row=[price_day_1, price_day_2, ...])
x = np.array([[8, 9, 11, 12],
              [1, 2, 2, 1], 
              [2, 8, 9, 9],
              [9, 6, 6, 3],
              [3, 3, 3, 3]])

avg = np.average(x, axis=1)

print("Averages: " + str(avg))

Averages: [10.   1.5  7.   6.   3. ]

Note that you want to perform the function along the axis=1, i.e., this is the axis that is aggregated to a single value. Hence, the resulting NumPy arrays have a reduced dimensionality.

High-dimensional Averaging Along An Axis

Of course, you can also perform this averaging along an axis for high-dimensional NumPy arrays. Conceptually, you’ll always aggregate the axis you specify as an argument.

Here is an example:

import numpy as np

x = np.array([[[1,2], [1,1]],
              [[1,1], [2,1]],
              [[1,0], [0,0]]])

print(np.average(x, axis=2))

[[1.5 1. ]
 [1.  1.5]
 [0.5 0. ]]

NumPy Average Puzzle

Puzzles are a great way to test and train your coding skills. Have a look at the following puzzle:

import numpy as np

# Goals in five matches
goals_brazil = np.array([1,2,3,1,2])
goals_germany = np.array([1,0,1,2,0])

br = np.average(goals_brazil)
ge = np.average(goals_germany)

Exercise: What is the output of this puzzle?
*Beginner Level*

You can solve this puzzle on the interactive Finxter puzzle app:

This puzzle introduces one new feature of the NumPy library: the average function. When applied to a 1D array, this function returns the average value of the array.

In the puzzle, the average of the goals of the last five games of Brazil is 1.8 and of Germany is 0.8. On average, Brazil shot one more goal per game.

Are you a master coder?
Test your skills now!

Leave a Comment

Your email address will not be published. Required fields are marked *