Selective Indexing

Conditional Indexing: How to Conditionally Select Elements in a NumPy Array?

Problem Description: You have a Numpy array. You want to select specific elements from the array. But neither slicing nor indexing seem to solve your problem. What can you do?

In this short tutorial, I show you how to select specific Numpy array elements via Boolean matrices. A feature called conditional indexing or selective indexing.

❗ Selective Indexing: NumPy arrays can be sliced to extract subareas of the global array. Normal slicing such as a[i:j] would carve out a sequence between i and j. But selective indexing (also: conditional indexing) allows you to carve out an arbitrary combination of elements from the NumPy array by defining a Boolean array with the same shape. If the Boolean value at the index (i,j) is True, the element will be selected, otherwise not.

For example, this is how you can use NumPy’s broadcasting feature to conditionally select elements that fall in a certain range:

import numpy as np


A = np.array([[1,2,3],
             [4,5,6],
             [1,2,3]])

print(A[A > 3])
# [4 5 6]

Here’s another example of selective indexing:

import numpy as np


a = np.arange(9)
a = a.reshape((3,3))

print(a)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]

b = np.array(
    [[ True, False, False],
     [ False, True, False],
     [ False, False, True]])
print(a[b])
# Flattened array with selected values from a
# [0 4 8]

In the above code, the matrix b with shape (3,3) is a parameter of a’s indexing scheme.

Beautiful, isn’t it?

Let me highlight an important detail. In the example, you select an arbitrary number of elements from different axes. How is the Python interpreter supposed to decide about the final shape? For example, you may select four rows for column 0 but only 2 rows for column 1 – what’s the shape here? There is only one solution: the result of this operation has to be a one-dimensional NumPy array.

Background

Let’s start with two pieces of background information to help you process the code more effectively:

💡 The function np.arange([start,] stop[, step]) creates a new array with evenly spaced numbers between start (inclusive) and stop (exclusive) with the given step size. For example, np.arange(1, 6, 2) creates the numpy array [1, 3, 5]. You can also skip the start and step arguments (default values are start=0 and step=1).

Before we dive into conditional indexing, let’s first introduce the concept of reshaping a a NumPy array:

💡 The function array.reshape(shape) takes a shape tuple as an argument whereas each tuple value defines the number of data values of a single dimension. It brings the NumPy array in the new form as specified by the shape argument.

NumPy Cheat Sheet (PDF)

Here’s a quick download for you: I created this cheating sheet to explain some important NumPy concepts to my coding students.

(Click to download PDF)

You can also download more Python related cheat sheets here:

Do you want to become a NumPy master? Check out our interactive puzzle book Coffee Break NumPy and boost your data science skills! (Amazon link opens in new tab.)

Coffee Break NumPy