np.diff() - A Simple Illustrated Guide - Be on the Right Side of Change

In Python, the numpy.diff() function calculates the n-th discrete difference between adjacent values in an array along with a given axis. For higher-order differences calculation, numpy.diff() runs recursively to the output of the previous execution.

Here is the argument table of numpy.diff():

If it sounds great to you, please continue reading, and you will fully understand the numpy.diff() function through Python code snippets and vivid visualization.

This tutorial is about numpy.diff() function.

Concretely, I will introduce its syntax and arguments.
Then, you will learn some basic examples of this function.
Finally, I will address three top questions about numpy.diff(), including np.diff prepend, np.diff vs. np.gradient, and np.diff datetime.

You can find all codes in this tutorial here.

Syntax and Arguments

Here is the syntax of numpy.diff():

numpy.diff(a, n=1, axis=-1, prepend=<no value>, append=<no value>)

Argument Table

Here’s the same table for copy&pasting:

Argument	Accept	Description
`a`	`array_like`	Input array
`n`	`int`, optional	The number of times values are differenced. If zero, the input is returned as-is.
`axis`	`int`, optional	The axis along which the difference is taken, default is the last axis.
`prepend`, `append`	`array_like` or scalar values, optional	Values to prepend or append to `a` along axis prior to performing the difference. Scalar values are expanded to arrays with length 1 in the direction of axis and the shape of the input array in along all other axes. Otherwise, the dimension and shape must match `a` except along axis.

Basic Examples

As mentioned before, for higher-order differences calculation, numpy.diff() runs recursively to the output of the previous execution.

This function may sound abstract, but I’ve been there before. Let me help you understand this step by step!

“0” difference and 1st difference in a one-dimensional array

Here are the “0” difference and 1st difference in a one-dimensional array code examples:

import numpy as np

# “0” difference and 1st difference in one-dimensional array example
'''
The first difference is given by out[i] = a[i+1] - a[i] along the given axis, 
higher differences are calculated by using diff recursively.
'''
one_dim = np.array([1, 2, 4, 7, 12])
it_self = np.diff(one_dim, n=0)
one_diff = np.diff(one_dim, n=1)

print(f'One dimensional array: {one_dim}')
print(f'"0" difference: {it_self}')
print(f'1st difference: {one_diff}')

Output:

2nd difference and 3rd difference in a one-dimensional array

Here are the 2nd difference and 3rd difference in a one-dimensional array code examples:

import numpy as np
# 2nd difference and 3rd difference example
'''
The first difference is given by out[i] = a[i+1] - a[i] along the given axis, 
higher differences are calculated by using diff recursively.
'''
one_dim = np.array([1, 2, 4, 9, 15, 20])
one_diff = np.diff(one_dim, n=1)
two_diff = np.diff(one_dim, n=2)
three_diff = np.diff(one_dim, n=3)

print(f'One dimensional array: {one_dim}')
print(f'1st difference: {one_diff}')
print(f'2nd difference: {two_diff}')
print(f'3rd difference: {three_diff}')

Output:

2nd difference in a two-dimensional array with axis = 0

Here is the 2nd difference in a two-dimensional array with axis = 0 example:

import numpy as np
# 2nd difference in two-dimensional array example - axis=0
'''
The first difference is given by out[i] = a[i+1] - a[i] along the given axis,
higher differences are calculated by using diff recursively.
'''
two_dim = np.array([[1, 2, 4, 9, 15, 20],
                   [4, 2, 1, 0, 24, 8],
                   [3, 7, 5, 13, 17, 0]])
one_diff = np.diff(two_dim, n=1, axis=0)
two_diff = np.diff(two_dim, n=2, axis=0)

print(f'Two dimensional array: {two_dim}')
print('-'*85)
print(f'1st difference: {one_diff}')
print('-'*85)
print(f'2nd difference: {two_diff}')

Output:

2nd difference in a two-dimensional array with axis = 1

Here is the 2nd difference in a two-dimensional array with axis = 1 example:

import numpy as np
# 2nd difference in two-dimensional array example - axis=1
'''
The first difference is given by out[i] = a[i+1] - a[i] along the given axis, 
higher differences are calculated by using diff recursively.
'''
two_dim = np.array([[1, 2, 4, 9, 15, 20],
                   [4, 2, 1, 0, 24, 8],
                   [3, 7, 5, 13, 17, 0]])
one_diff = np.diff(two_dim, n=1, axis=1)
two_diff = np.diff(two_dim, n=2, axis=1)

print(f'Two dimensional array: {two_dim}')
print('-'*85)    
print(f'1st difference: {one_diff}')
print('-'*85)
print(f'2nd difference: {two_diff}')

Output:

Now, I hope you understand how numpy.diff() works in higher-order differences calculation and how the axis argument helps manipulate the calculation direction.

Let’s now dive into top questions regarding this function and gain further understanding!

np.diff() prepend

First, many people find the argument prepend and append in this function hard to understand.

Since these two arguments work pretty similarly, I will help you comprehend the prepend argument in this part and leave you to figure out the append argument yourself 🙂

Here is our previous argument table, where you can find the description of the prepend argument.

From the above, we can see that there are two ways, the array way and scalar values way, to prepend values to an along axis before performing the difference calculation.

Here is the array way:

import numpy as np

# prepend with array - axis=0
two_dim = np.array([[1, 2, 4, 9, 15, 20],
                   [4, 2, 1, 0, 24, 8],
                   [3, 7, 5, 13, 17, 0]])

one_diff = np.diff(two_dim, n=1, axis=0, prepend=[[1] * two_dim.shape[1]])
two_diff = np.diff(two_dim, n=2, axis=0, prepend=[[1] * two_dim.shape[1]])
# one_diff = np.diff(two_dim, n=1, axis=0, prepend=[[1, 1, 1, 1, 1, 1]])
# two_diff = np.diff(two_dim, n=2, axis=0, prepend=[[1, 1, 1, 1, 1, 1]])

print(f'Two dimensional array: {two_dim}')
print('-'*85)
print(f'1st difference: {one_diff}')
print('-'*85)
print(f'2nd difference: {two_diff}')

Output:

Here is the scalar values way:

# prepend with scalar values - axis=0
import numpy as np
two_dim = np.array([[1, 2, 4, 9, 15, 20],
                   [4, 2, 1, 0, 24, 8],
                   [3, 7, 5, 13, 17, 0]])

one_diff = np.diff(two_dim, n=1, axis=0, prepend=1)
two_diff = np.diff(two_dim, n=2, axis=0, prepend=1)
# one_diff = np.diff(two_dim, n=1, axis=0, prepend=[[1, 1, 1, 1, 1, 1]])
# two_diff = np.diff(two_dim, n=2, axis=0, prepend=[[1, 1, 1, 1, 1, 1]])
print(f'Two dimensional array: {two_dim}')
print('-'*85)
print(f'1st difference: {one_diff}')
print('-'*85)
print(f'2nd difference: {two_diff}')

Output:

In conclusion, you can either pass a scalar value or an array to prepend or append to an along axis prior to performing the difference calculation.

It is easier to pass a scalar value if you just want to prepend or append the same values. And the array option gives you the flexibility to structure any values that you want to prepend or append.

np.diff() vs np.gradient()

Another confusing point about this function is its difference from another function, numpy.gradient().

Simply put, numpy.diff() calculates the n-th discrete differences between adjacent values along a given axis and only involves subtraction mathematically.
However, numpy.gradient() calculates the gradient of an N-dimensional array and involves subtraction and division mathematically.

For numpy.gradient() function, the gradient is computed using second order accurate central differences in the interior points and either first or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array.

Intuitively, the numpy.gradient() function is used to measure the change rate in an N-dimensional array, which is like the slope concept in a two-dimensional plane.

To be honest, the numpy.gradient() is another hard-to-understand function. If you’d like me to write another article about it, please let me know! 🙂

For now, I hope you know intuitively what the difference is between these two functions.

np.diff() datetime

In our previous examples, we have only dealt with numerical values. Good news! The np.diff() method can also be used to handle datetime format arrays!

Here is the example of handling datetime format arrays:

import numpy as np

'''
Generally, the type of the np.diff()’s output is the same as the type of the difference between any two elements of input array. 
A notable exception is datetime64, which results in a timedelta64 output array.
'''
# dates = np.arange('1100-10-01', '1100-10-05', dtype=np.datetime64)
# one_diff = np.diff(dates, n=1)
dates = np.arange('2066-10-13', '2066-10-16', dtype=np.datetime64)
one_diff = np.diff(dates, n=1)

print(f'Original dates: {dates}')
print('-'*85)
print(f'Original date\'s type: {dates.dtype}')
print('-'*85)
print(f'One difference: {one_diff}')
print('-'*85)
print(f'One difference\'s type: {one_diff.dtype}')

Output:

Please be aware that generally, the type of the np.diff()’s output is the same as the type of the difference between any two elements of input array.

A notable exception is datetime64, right here, which results in a timedelta64 output array.

Summary

That’s it for our np.diff() article.

We learned about its syntax, arguments, and basic examples.

We also worked on the top three questions about the np.diff() function, ranging from np.diff prepend, np.diff vs. np.gradient, and np.diff datetime.

Hope you enjoy all this and happy coding!