Python Numpy 101: How to Calculate the Row Variance of a Numpy 2D Array?

Daily Data Science Puzzle

import numpy as np

# stock prices (3x  per day)
# [morning, midday, evening]
APPLE = np.array(
    [[50,60,55], # day 1
     [60,60,65]]) # day 2

# midday variance
y = np.var(APPLE, axis=0)[1]


What is the output of this puzzle?
*Advanced Level* (solution below)

Numpy is a popular Python library for data science focusing on arrays, vectors, and matrices.

This puzzle introduces a new feature of the numpy library: the variance function. When applied to a 1D numpy array, this function returns the variance of the array values. The variance is the average squared deviation from the mean of the values in the array.
When applied to a 2D numpy array, numpy simply flattens the array. The result is the variance of the flattened 1D array.

In the puzzle, we have a matrix with two rows and three columns. The matrix gives the stock prices of the Apple stock. Each row represents the prices for one day. The first column specifies the morning price, the second the midday price, and the third the evening price.

Now, we do not want to know the variance of the flattened matrix but the variance of the price in the midday. In both days, the midday price was $60. Therefore, the variance is 0.

Numpy provides this functionality via the axis parameter. In a 2D matrix, the row is specified as axis=0 and the column as axis=1. We want to know three variances, for the morning, midday, and evening. Hence, we calculate the variance along the row, i.e., axis=0. This results in three variance values. Now we take the second element to get the midday variance.

Are you a master coder?
Test your skills now!

Related Video




Leave a Comment