Problem Formulation: How to calculate the standard deviation in NumPy?
Differentiations: There are many different variants of this problem:
- Calculate the standard deviation of a 1D array
- Calculate the standard deviation of a 2D array
- Calculate the standard deviation of a 3D array
Then you can also calculate the standard deviation along an axis:
- Calculate the standard deviation of a 2D array along the columns
- Calculate the standard deviation of a 2D array along the rows
All of them use the np.std(array, axis)
function that can be customized to the problem at hand.
Syntax: np.std(array, axis=0)
Argument | array-like | Array for which the standard deviation should be calculated |
Argument | axis | Axis along which the standard deviation should be calculated. Optional. |
Return Value | array or number | If no axis argument is given (or is set to 0), returns a number. Otherwise returns the standard deviation along the axis which is a NumPy array with a dimensionality reduced by one. |
Before we dive into the different ways to calculate the standard deviation in NumPy, let me quickly give you a hint that there are additional optional arguments—but most of them are little-used. You can check them out here.
How to calculate the standard deviation of a 1D array
import numpy as np arr = np.array([0, 10, 0]) dev = np.std(arr) print(dev) # 4.714045207910316
How to calculate the standard deviation of a 2D array
import numpy as np arr = np.array([[1, 2, 3], [1, 1, 1]]) dev = np.std(arr) print(dev) # 0.7637626158259734
How to calculate the standard deviation of a 3D array
import numpy as np arr = np.array([[[1, 1], [0, 0]], [[0, 0], [0, 0]]]) dev = np.std(arr) print(dev) # 0.4330127018922193
You can pass an n-dimensional array and NumPy will just calculate the standard deviation of the flattened array.
How to calculate the standard deviation of a 2D array along the columns
import numpy as np matrix = [[1, 2, 3], [2, 2, 2]] # calculate standard deviation along columns y = np.std(matrix, axis=0) print(y) # [0.5 0. 0.5]
How to calculate the standard deviation of a 2D array along the rows
import numpy as np matrix = [[1, 2, 3], [2, 2, 2]] # calculate standard deviation along rows z = np.std(matrix, axis=1) print(z) # [0.81649658 0.]
Data Science NumPy Puzzle
import numpy as np # daily stock prices # [open, close] google = np.array( [[1239, 1258], # day 1 [1262, 1248], # day 2 [1181, 1205]]) # day 3 # standard deviation y = np.std(google, axis=1) print(y[2] == max(y))
What is the output of this puzzle?
*Advanced Level*
You can solve the puzzle in our interactive Finxter app here:
Numpy is a popular Python library for data science focusing on arrays, vectors, and matrices.
This puzzle introduces the standard deviation function of the NumPy library. When applied to a 1D array, this function returns its standard deviation. When applied to a 2D array, NumPy simply flattens the array. The result is the standard deviation of the flattened 1D array.
In the puzzle, we have a matrix with three rows and two columns. The matrix stores the open and close prices of the Google stock for three consecutive days. The first column specifies the opening price, the second the closing price.
We are interested in the standard deviation of the three days. How much does the stock price deviate from the mean between the opening and the closing price?
Numpy provides this functionality via the axis parameter. In a 2D matrix, the row is specified as axis=0
and the column as axis=1
. We want to compute the standard deviation along the column, i.e., axis=1
. This results in three standard deviation values – one per each day.
Clearly, on the third day, we have observed the highest standard deviation.
Are you a master coder?
Test your skills now!