NumPy is a popular Python library for data science focusing on arrays, vectors, and matrices. It’s at the core of data science and machine learning in Python. In today’s article, you’ll going to master NumPy’s impressive average()
function that will be a loyal friend to you when fighting your upcoming data science battles.
average(a, axis=None, weights=None, returned=False)
Argument | Description |
---|---|
a | array-like: The array contains the data to be averaged. Can be multi-dimensional and it doesn’t have to be a NumPy array—but usually, it is. |
axis=None | None or int or tuple of ints: The axis along which to average the array a . |
weights=None | array-like: An array of weights associated to the values in the array a . This allows you to customize the weight towards the average of each element in the array. |
returned=False | Boolean: If False , returns the average value. If True , returns the tuple of the (average, sum_of_weights) so that you can easily normalize the weighted average. |
Here’s a short summary of the np.average()
function:
NumPyβs average function computes the average of all numerical values in a NumPy array. When used without parameters, it simply calculates the numerical average of all values in the array, no matter the arrayβs dimensionality. For example, the expression np.average([[1,2],[2,3]])
results in the average value (1+2+2+3)/4 = 2.0
.
How to Calculate the Weighted Average of a Numpy Array in Python?
However, what if you want to calculate the weighted average of a NumPy array? In other words, you want to overweight some array values and underweight others.
You can easily accomplish this with NumPy’s average function by passing the weights argument to the NumPy average
function.
import numpy as np a = [-1, 1, 2, 2] print(np.average(a)) # 1.0 print(np.average(a, weights = [1, 1, 1, 5])) # 1.5
In the first example, we simply averaged over all array values: (-1+1+2+2)/4 = 1.0
. However, in the second example, we overweight the last array element 2—it now carries five times the weight of the other elements resulting in the following computation: (-1+1+2+(2+2+2+2+2))/8 = 1.5
.
How to Average Along an Axis?
Extracting basic statistics from matrices (e.g. average, variance, standard deviation) is a critical component for analyzing a wide range of data sets such as financial data, health data, or social media data. With the rise of machine learning and data science, your proficient education of linear algebra operators with NumPy becomes more and more valuable to the marketplace
In the following, you’ll learn how to average along an axis. Hereβs what you want to achieve:
Here is how you can average along an axis in NumPy:
import numpy as np x = np.array([[1, 3, 5], [1, 1, 1], [0, 2, 4]]) print(np.average(x, axis=1)) # [3. 1. 2.]
NumPy internally represents data using NumPy arrays (np.array
). These arrays can have an arbitrary number of dimensions. In the figure above, we show a two-dimensional NumPy array.
In practice, the array can have much higher dimensionality. You can quickly identify the dimensionality of a NumPy array by counting the number of opening brackets β[
β when creating the array. The more formal alternative would be to use the ndim
property.
Each dimension has its own axis identifier. As a rule of thumb: the outermost dimension has the identifier β0β, the second-outermost dimension has the identifier β1β, and so on.
By default, the NumPy average function aggregate all the values in a NumPy array to a single value:
import numpy as np x = np.array([[1, 3, 5], [1, 1, 1], [0, 2, 4]]) print(np.average(x)) # 2.0
For example, the simple average of a NumPy array is calculated as follows:
(1+3+5+1+1+1+0+2+4)/9 = 18/9 = 2.0
Calculating Average, Variance, Standard Deviation Along an Axis
However, sometimes you want to average along an axis.
For example, you may work at a large financial corporation and want to calculate the average value of a stock price — given a large matrix of stock prices (rows = different stocks, columns = daily stock prices).
Here is how you can do this by specifying the keyword βaxis
β as an argument to the average function:
import numpy as np ## Stock Price Data: 5 companies # (row=[price_day_1, price_day_2, ...]) x = np.array([[8, 9, 11, 12], [1, 2, 2, 1], [2, 8, 9, 9], [9, 6, 6, 3], [3, 3, 3, 3]]) avg = np.average(x, axis=1) print("Averages: " + str(avg)) """ Averages: [10. 1.5 7. 6. 3. ] """
Note that you want to perform the function along the axis=1
, i.e., this is the axis that is aggregated to a single value. Hence, the resulting NumPy arrays have a reduced dimensionality.
High-dimensional Averaging Along An Axis
Of course, you can also perform this averaging along an axis for high-dimensional NumPy arrays. Conceptually, youβll always aggregate the axis you specify as an argument.
Here is an example:
import numpy as np x = np.array([[[1,2], [1,1]], [[1,1], [2,1]], [[1,0], [0,0]]]) print(np.average(x, axis=2)) """ [[1.5 1. ] [1. 1.5] [0.5 0. ]] """
NumPy Average Puzzle
Puzzles are a great way to test and train your coding skills. Have a look at the following puzzle:
import numpy as np # Goals in five matches goals_brazil = np.array([1,2,3,1,2]) goals_germany = np.array([1,0,1,2,0]) br = np.average(goals_brazil) ge = np.average(goals_germany) print(br>ge)
Exercise: What is the output of this puzzle?
*Beginner Level*
You can solve this puzzle on the interactive Finxter puzzle app:
This puzzle introduces one new feature of the NumPy library: the average function. When applied to a 1D array, this function returns the average value of the array.
In the puzzle, the average of the goals of the last five games of Brazil is 1.8 and of Germany is 0.8. On average, Brazil shot one more goal per game.
Are you a master coder?
Test your skills now!