Definition np.cumsum(x): The function computes the cumulative sum of a NumPy array. For an array
x with elements
[a b c d] the cumulative sum is
[a a+b a+b+c a+b+c+d]. Formally, each array element with index
i is the sum of all elements with index
numpy.cumsum(a, axis=None, dtype=None, out=None)
- a — Array-like data type. Input array of the function
- axis — Integer value. The axis along which you want to compute the cumulative sum. Per default, you’ll compute the cumulative sum over the flattened array.
- dtype — Return array type. Also the type of the accumulated sum. Per default, it’s the dtype of array a.
- out — NumPy array. If you want to store your result in an alternative array, use this argument.
Try it yourself in our interactive Python shell:
Exercise: Can you already figure out the output of the code snippet?
Next, you’ll learn everything you need to know about
np.cumsum(). So keep reading!
What is the NumPy cumsum() Function?
Given an input array, NumPy‘s
cumsum() function calculates the cumulative sum of the values in the array. It produces a new array as a result.
It is important to emphasize the difference between the cumulative sum and the sum:
It might seem intuitive that a cumulative sum is a single number obtained by aggregation. But, this is not the case! This would be the sum of the numbers in an array. For example. the sum of numbers from 1 to 5 is 1+2+3+4+5 = 15. The sum represents the “total”, it aggregates the data in the array to a single number.
On the other hand, the cumulative sum would be the “running total”. Let’s say that you want to keep track of your total savings in a spreadsheet. Before you add a new amount to the savings, you want to know the previous total. For example, the first week you save $100. After the first week, you will have $100 in your savings. The second week you add another $100. After the second week, you will have $200 and so on.
If we have an array with elements (a, b, c, d) the cumulative sum is (a, a+b, a+b+c, a+b+c+d).
Here is the example that calculates the cumulative sum for the savings account.
# import NumPy library # we assume that this has already been done in the future examples import numpy as np # create an array that represents our savings each week over two months savings = np.array([[100, 200, 150, 220], [300, 200, 150, 100]]) # calculate the cumulative sum cumsum = np.cumsum(savings) print(cumsum) # array([ 100, 300, 450, 670, 970, 1170, 1320, 1420])
We can see that after the first week we had $100, after the second week we had $300 and so on. After two months, we had $1420 in our savings.
The Syntax of np.cumsum()
Let’s have a look at the general syntax:
np.cumsum(array, axis=None, dtype=None, out=None)
The function has the following arguments:
- The input
arraycan be any NumPy array “flattened” or multi-dimensional.
Noneby default. If unspecified, it computes the cumulative sum over the flattened array. Otherwise, the
axisargument can be 0,1,2… depending on the array dimension. In this case, we calculate the cumulative sum along the specified axis. This is an optional argument.
- The argument
dtypespecifies the type of the returned array. This is an optional argument, and if it is not specified then it takes the type of the input array.
- The argument
outis an optional argument. It defines the output array in which the result of the function should be placed. If unspecified, a new array is created.
NumPy cumsum() Axes
To understand how the
cumsum() function works, we need to have a good understanding of the NumPy axes. The NumPy arrays can be one-dimensional or multi-dimensional.
Cumulative Sum of a Flattened Array (1-D)
One dimensional arrays are denoted as “flat”:
The one-dimensional array is a row vector and its shape is a single value iterable followed by a comma. One-dimensional arrays don’t have rows and columns, so the shape attribute returns a single value tuple.“The Ultimate Guide to NumPu Reshape() in Python”
One-dimensional arrays only have a single axis (specified as
axis=0). When using
# create an array one_D_arr = np.array(np.arange(10)) print(one_D_arr) # [0 1 2 3 4 5 6 7 8 9] # cumulative sum cumsum = np.cumsum(one_D_arr) print(cumsum) # array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])
So when dealing with one-dimensional arrays, you don’t need to define the
axis argument to calculate the cumulative sum with NumPy.
Cumulative Sum of a Matrix (2D array)
A two-dimensional array is equal to a matrix with rows and columns. Axis 0 goes along rows of a matrix. Axis 1 goes along the columns of a matrix.
The axes start at 0 like indices of Python lists. If we don’t specify the axis, the cumulative sum results in a 1-D array. NumPy will flatten the input array.
Here is an example of a 2-D array without a specified axis:
#2-D array two_D_arr = np.array([[1,2,3], [4,5,6]]) cumsum = np.cumsum(two_D_arr) print(cumsum) # array([ 1, 3, 6, 10, 15, 21])
Now, let’s see how we would get the cumulative sum of
#2-D array two_D_arr = np.array([[1,2,3], [4,5,6]]) cumsum = np.cumsum(two_D_arr, axis = 0) print(cumsum) # array([[1, 2, 3],[5, 7, 9]])
The first row, [1,2,3] stays the same. Recap that savings example! If you saved $100 the first week, the cumulative sum after that first week will be that $100.
We get the second row by adding the same indices from each row:
[1+4, 2+5, 3+6] = [5, 7, 9]
Finally, let’s see what happens when we calculate the cumulative sum over axis 1.
#2-D array two_D_arr = np.array([[1,2,3], [4,5,6]]) cumsum = np.cumsum(two_D_arr, axis = 1) print(cumsum) # array([[ 1, 3, 6], [ 4, 9, 15]])
Here the summation is happening “inside” of each element.
1st element [ 1, 3, 6] = [1, 1+3, 1+2+3]
2nd element [ 4, 9, 15] = [4, 4+5, 4+5+6]
What’s the Difference Between Pandas cumsum() and NumPy cumsum()?
There is a
cumsum() function in the pandas library. I will briefly mention that the main data structure in pandas is a data frame. In a way, it’s like the 2-D array because it contains rows and columns. Unlike a 2-D array, a data frame is the Python equivalent of an Excel spreadsheet, with an index column and a header row. A pandas series is similar to a 1-D array, as it is a 1-D object.
The syntax of the pandas
cumsum() function is
The main difference between NumPy
cumsum() and pandas
cumsum() functions is that pandas
cumsum() works with
skipna argument is
True by default, so the cumulative sum will be exactly what you would expect it to be. Except that anything added to
NaN value produces another
NaN value. If elements in the original series are integers, but there is at least one
NaN value, the elements in the cumulative sum series will be of
series = pd.Series([1,2,3,np.nan]) cumsum = series.cumsum() print(cumsum) ''' 0 1.0 1 3.0 2 6.0 3 NaN dtype: float64 '''
cumsum() function sums up the values in the pandas series:
1 1+2 = 3 3+3 = 6 6+NaN = NaN
After conversion to the float data type, we obtain the resulting pandas series.
NumPy cumprod() Function
It is good to know that there exists a NumPy cumulative product function
Now that we understand what
cumsum() does, explaining what
cumprod() does is straightforward. The function calculates the cumulative product along an axis. I will not be going in any more details about
cumprod() in this blog post.
The syntax is
numpy.cumprod(array, axis=None, dtype=None, out=None).
Consider the following examples:
#2-D array two_D_arr = np.array([[1,2,3], [4,5,6]]) cumprod = np.cumprod(two_D_arr) print(cumprod) # array([ 1, 2, 6, 24, 120, 720])
The same axes logic that applies to
cumsum() applies to
Let’s finish up with some examples.
Number of Subscribers
You want to run a report and see how many new subscribers your company had over the past year. The data is collected every 1st day of the month at midnight.
Your task is to determine how the total number of subscribers fluctuated each month, and to establish the overall trend. You can assume that nobody cancels the subscription.
Here is the number of new subscribers for each month over the past year.
''' | Month | Subscribers | |:------------:|:-------------:| | August | 347 | | September | 326 | | October | 389 | | November | 405 | | December | 476 | | January | 474 | | February | 602 | | March | 626 | | April | 699 | | May | 817 | | June | 812 | | July | 963 | '''
Let’s plot your findings and make conclusions based on the plotted data.
#import libraries import numpy as np import matplotlib.pyplot as plt import matplotlib.dates as mdates subscribers = np.array([347, 326, 389, 405, 476, 474, 602, 626, 699, 817, 812, 963]) cumulative_sum = np.cumsum(subscribers, dtype = int) figure = plt.plot(subscribers, color='g', label = 'subscribers') cumsum = plt.plot(cumulative_sum, color='orange', label = 'cumulative sum') plt.legend(loc='upper left') plt.show()
Executing this code that makes use of the
np.cumsum() function results in the following plot:
The number of new subscribers seems to grow linearly. Because of the accumulation effect, the cumulative sum of the subscribers grows quadratically.
Let’s say that we have an array [a, b, c, d] and we want to compute [d+c+b+a, d+c+b, d+c, d]. We are going to call this a “reverse cumulative sum”. For our input array, we will use the
subscribers array from the previous example.
subscribers = np.array([347, 326, 389, 405, 476, 474, 602, 626, 699, 817, 812, 963]) reverse_cumsum = np.cumsum(subscribers[::-1])[::-1] print(reverse_cumsum) # array([6936, 6589, 6263, 5874, 5469, 4993, 4519, 3917, 3291, 2592, 1775, 963])
We use the
cumsum() function in combination with slicing (negative step size) to accomplish the desired result.
Cumulative distribution function (CDF) and area under the curve (AUC)
The cumulative distribution function (CDF) of a random variable X gives the probability that a value is less than or equal to x.
Let’s assume that we have a random variable that follows a normal (Gaussian) distribution. This is a continuous distribution, so the CDF of the normal distribution is represented by the area under the curve from negative infinity to x.
For the sake of our example, we are going to create a random series
Here’s the code:
import pandas as pd import numpy as np # used only to create example data import matplotlib.pyplot as plt # Create a random normally distributed series series = pd.Series(np.random.normal(size=10000)) # s=Size of our data series_size=len(series) # Sort the data and set bins edges sorted_series = np.sort(series) bins = np.append(sorted_series, sorted_series[-1]+1) # Use the histogram function to bin the data hist, bin_edges = np.histogram(series, bins = bins) # Account for the possible float data hist = hist.astype(float)/series_size # Find the cdf cdf = np.cumsum(hist) # Plot the cdf plt.plot(bin_edges[1:], cdf) plt.show()
When executing the code snippet, we obtain the following plot:
cumsum() function has a wide range of uses from the basic financial problems to more complex machine learning applications. Make sure to master it!
Where to Go From Here?
Enough theory, let’s get some practice!
To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
Practice projects is how you sharpen your saw in coding!
Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?
Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.
Finxter.com user Milica contributed this article. Thanks, Milica for the great content! 👩🎓
Want to improve your Python skills? Join the FREE Python email training course and download your Python (and NumPy) cheat sheets…