The Ultimate Guide to NumPy Cumsum in Python - Be on the Right Side of Change

Definition np.cumsum(x): The function computes the cumulative sum of a NumPy array. For an array x with elements [a b c d] the cumulative sum is [a a+b a+b+c a+b+c+d]. Formally, each array element with index i is the sum of all elements with index j<i.

numpy.cumsum(a, axis=None, dtype=None, out=None)

Arguments:

a — Array-like data type. Input array of the function
axis — Integer value. The axis along which you want to compute the cumulative sum. Per default, you’ll compute the cumulative sum over the flattened array.
dtype — Return array type. Also the type of the accumulated sum. Per default, it’s the dtype of array a.
out — NumPy array. If you want to store your result in an alternative array, use this argument.

Try it yourself in our interactive Python shell:

Exercise: Can you already figure out the output of the code snippet?

Next, you’ll learn everything you need to know about np.cumsum(). So keep reading!

What is the NumPy cumsum() Function?

Given an input array, NumPy‘s cumsum() function calculates the cumulative sum of the values in the array. It produces a new array as a result.

It is important to emphasize the difference between the cumulative sum and the sum:

It might seem intuitive that a cumulative sum is a single number obtained by aggregation. But, this is not the case! This would be the sum of the numbers in an array. For example. the sum of numbers from 1 to 5 is 1+2+3+4+5 = 15. The sum represents the “total”, it aggregates the data in the array to a single number.

On the other hand, the cumulative sum would be the “running total”. Let’s say that you want to keep track of your total savings in a spreadsheet. Before you add a new amount to the savings, you want to know the previous total. For example, the first week you save $100. After the first week, you will have $100 in your savings. The second week you add another $100. After the second week, you will have $200 and so on.

If we have an array with elements (a, b, c, d) the cumulative sum is (a, a+b, a+b+c, a+b+c+d).

Here is the example that calculates the cumulative sum for the savings account.

# import NumPy library
# we assume that this has already been done in the future examples
import numpy as np

# create an array that represents our savings each week over two months
savings = np.array([[100, 200, 150, 220], [300, 200, 150, 100]])

# calculate the cumulative sum
cumsum = np.cumsum(savings)

print(cumsum)
# array([ 100, 300, 450, 670, 970, 1170, 1320, 1420])

We can see that after the first week we had $100, after the second week we had $300 and so on. After two months, we had $1420 in our savings.

The Syntax of np.cumsum()

Let’s have a look at the general syntax:

np.cumsum(array, axis=None, dtype=None, out=None)

The function has the following arguments:

The input array can be any NumPy array “flattened” or multi-dimensional.
The axis argument is None by default. If unspecified, it computes the cumulative sum over the flattened array. Otherwise, the axis argument can be 0,1,2… depending on the array dimension. In this case, we calculate the cumulative sum along the specified axis. This is an optional argument.
The argument dtype specifies the type of the returned array. This is an optional argument, and if it is not specified then it takes the type of the input array.
The argument out is an optional argument. It defines the output array in which the result of the function should be placed. If unspecified, a new array is created.

NumPy cumsum() Axes

To understand how the cumsum() function works, we need to have a good understanding of the NumPy axes. The NumPy arrays can be one-dimensional or multi-dimensional.

Cumulative Sum of a Flattened Array (1-D)

One dimensional arrays are denoted as “flat”:

The one-dimensional array is a row vector and its shape is a single value iterable followed by a comma. One-dimensional arrays don’t have rows and columns, so the shape attribute returns a single value tuple.
“The Ultimate Guide to NumPu Reshape() in Python”

One-dimensional arrays only have a single axis (specified as axis=0). When using the cumsum() function, you don’t need to specify axis=0 if you are dealing with the 1-D array.

# create an array
one_D_arr = np.array(np.arange(10))

print(one_D_arr)
# [0 1 2 3 4 5 6 7 8 9]

# cumulative sum
cumsum = np.cumsum(one_D_arr)

print(cumsum)
# array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])

So when dealing with one-dimensional arrays, you don’t need to define the axis argument to calculate the cumulative sum with NumPy.

Cumulative Sum of a Matrix (2D array)

A two-dimensional array is equal to a matrix with rows and columns. Axis 0 goes along rows of a matrix. Axis 1 goes along the columns of a matrix.

The axes start at 0 like indices of Python lists. If we don’t specify the axis, the cumulative sum results in a 1-D array. NumPy will flatten the input array.

Here is an example of a 2-D array without a specified axis:

#2-D array
two_D_arr = np.array([[1,2,3], [4,5,6]])
cumsum = np.cumsum(two_D_arr)
print(cumsum)
# array([ 1, 3, 6, 10, 15, 21])

Now, let’s see how we would get the cumulative sum of the two-D-arr array along axis 0. The summation is “row-wise”.

#2-D array
two_D_arr = np.array([[1,2,3], [4,5,6]])
cumsum = np.cumsum(two_D_arr, axis = 0)
print(cumsum)
# array([[1, 2, 3],[5, 7, 9]])

The first row, [1,2,3] stays the same. Recap that savings example! If you saved $100 the first week, the cumulative sum after that first week will be that $100.

We get the second row by adding the same indices from each row:

[1+4, 2+5, 3+6] = [5, 7, 9]

Finally, let’s see what happens when we calculate the cumulative sum over axis 1.

#2-D array
two_D_arr = np.array([[1,2,3], [4,5,6]])
cumsum = np.cumsum(two_D_arr, axis = 1)
print(cumsum)
# array([[ 1, 3, 6], [ 4, 9, 15]])

Here the summation is happening “inside” of each element.

1st element [ 1, 3, 6] = [1, 1+3, 1+2+3]

2nd element [ 4, 9, 15] = [4, 4+5, 4+5+6]

What’s the Difference Between Pandas cumsum() and NumPy cumsum()?

There is a cumsum() function in the pandas library. I will briefly mention that the main data structure in pandas is a data frame. In a way, it’s like the 2-D array because it contains rows and columns. Unlike a 2-D array, a data frame is the Python equivalent of an Excel spreadsheet, with an index column and a header row. A pandas series is similar to a 1-D array, as it is a 1-D object.

The syntax of the pandas cumsum() function is series.cumsum(axis=None, skipna=True).

The main difference between NumPy cumsum() and pandas cumsum() functions is that pandas cumsum() works with NaN values. skipna argument is True by default, so the cumulative sum will be exactly what you would expect it to be. Except that anything added to NaN value produces another NaN value. If elements in the original series are integers, but there is at least one NaN value, the elements in the cumulative sum series will be of dtype float.

series = pd.Series([1,2,3,np.nan])
cumsum = series.cumsum()
print(cumsum)
'''
0 1.0
1 3.0
2 6.0
3 NaN
dtype: float64
'''

Explanation

The pandas cumsum() function sums up the values in the pandas series:

1
1+2 = 3
3+3 = 6
6+NaN = NaN

After conversion to the float data type, we obtain the resulting pandas series.

NumPy cumprod() Function

It is good to know that there exists a NumPy cumulative product function cumprod().

Now that we understand what cumsum() does, explaining what cumprod() does is straightforward. The function calculates the cumulative product along an axis. I will not be going in any more details about cumprod() in this blog post.

The syntax is numpy.cumprod(array, axis=None, dtype=None, out=None).

Consider the following examples:

#2-D array
two_D_arr = np.array([[1,2,3], [4,5,6]])
cumprod = np.cumprod(two_D_arr)
print(cumprod)
# array([ 1, 2, 6, 24, 120, 720])

The same axes logic that applies to cumsum() applies to cumprod().

Examples

Let’s finish up with some examples.

Number of Subscribers

You want to run a report and see how many new subscribers your company had over the past year. The data is collected every 1st day of the month at midnight.

Your task is to determine how the total number of subscribers fluctuated each month, and to establish the overall trend. You can assume that nobody cancels the subscription.

Here is the number of new subscribers for each month over the past year.

'''
| Month        | Subscribers   |
|:------------:|:-------------:|
| August       | 347           |
| September    | 326           |
| October      | 389           |
| November     | 405           |
| December     | 476           |
| January      | 474           |
| February     | 602           |
| March        | 626           |
| April        | 699           |
| May          | 817           |
| June         | 812           |
| July         | 963           |
'''

Let’s plot your findings and make conclusions based on the plotted data.

#import libraries import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

subscribers = np.array([347, 326, 389, 405, 476, 474, 602, 626, 699, 817, 812, 963])
cumulative_sum = np.cumsum(subscribers, dtype = int)

figure = plt.plot(subscribers, color='g', label = 'subscribers')
cumsum = plt.plot(cumulative_sum, color='orange', label = 'cumulative sum')
plt.legend(loc='upper left')
plt.show()

Executing this code that makes use of the np.cumsum() function results in the following plot:

The number of new subscribers seems to grow linearly. Because of the accumulation effect, the cumulative sum of the subscribers grows quadratically.

Reverse cumsum()

Let’s say that we have an array [a, b, c, d] and we want to compute [d+c+b+a, d+c+b, d+c, d]. We are going to call this a “reverse cumulative sum”. For our input array, we will use the subscribers array from the previous example.

subscribers = np.array([347, 326, 389, 405, 476, 474, 602, 626, 699, 817, 812, 963])
reverse_cumsum = np.cumsum(subscribers[::-1])[::-1]
print(reverse_cumsum)
# array([6936, 6589, 6263, 5874, 5469, 4993, 4519, 3917, 3291, 2592, 1775, 963])

We use the cumsum() function in combination with slicing (negative step size) to accomplish the desired result.

Cumulative distribution function (CDF) and area under the curve (AUC)

The cumulative distribution function (CDF) of a random variable X gives the probability that a value is less than or equal to x.

Let’s assume that we have a random variable that follows a normal (Gaussian) distribution. This is a continuous distribution, so the CDF of the normal distribution is represented by the area under the curve from negative infinity to x.

For the sake of our example, we are going to create a random series using np.random.normal() function, that draws random samples from the distribution. Then we are going to sort and bin our data. Finally, we are going to compute the area under the curve, which will represent our CDF function.

Here’s the code:

import pandas as pd
import numpy as np # used only to create example data
import matplotlib.pyplot as plt

# Create a random normally distributed series
series = pd.Series(np.random.normal(size=10000))

# s=Size of our data
series_size=len(series)

# Sort the data and set bins edges
sorted_series = np.sort(series)
bins = np.append(sorted_series, sorted_series[-1]+1)

# Use the histogram function to bin the data
hist, bin_edges = np.histogram(series, bins = bins)

# Account for the possible float data
hist = hist.astype(float)/series_size

# Find the cdf
cdf = np.cumsum(hist)

# Plot the cdf
plt.plot(bin_edges[1:], cdf)

plt.show()

When executing the code snippet, we obtain the following plot:

The cumsum() function has a wide range of uses from the basic financial problems to more complex machine learning applications. Make sure to master it!

Where to Go From Here?

Enough theory. Let’s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Milica Cvetkovic — This article was contributed by Finxter user Milica.

Finxter.com user Milica contributed this article. Thanks, Milica for the great content! ?‍?

Want to improve your Python skills? Join the FREE Python email training course and download your Python (and NumPy) cheat sheets…