This article will give you a practical one-liner solution and teach you how to write concise NumPy code using boolean indexing and broadcasting in NumPy.
NumPy plays an important role in the Python programming language. Not only does it add basic linear algebra functionality to Python, but, with its array data structure, it also provides a better and more convenient way of representing your data sets. In a way, NumPy arrays enrich the basic list data type with additional functionality such as multi-dimensional slicing and convenient indexing.
Have a look at the following code snippet.
import numpy as np a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) indices = np.array([[False, False, True], [False, False, False], [True, True, False]]) print(a[indices]) # [3 7 8]
We create two arrays “a” and “indices”. The first array contains two-dimensional numerical data – you can think of it as the data array. The second array has the same shape and contains Boolean values – think of it as the indexing array. A great feature of NumPy is that you can use the Boolean array for fine-grained data array access. In plain English, we create a new NumPy array from the data array containing only those elements for which the indexing array contains “True” Boolean values at the respective array positions. Thus, the resulting array contains the three values 3, 7, and 8.
In the following one-liner, you are going to use this feature for miniature social network analysis.
We are examining the following problem: “Find the names of the Instagram superstars with more than 100 million followers!”
## Dependencies import numpy as np ## Data: popular Instagram accounts (millions followers) inst = np.array([[232, "@instagram"], [133, "@selenagomez"], [59, "@victoriassecret"], [120, "@cristiano"], [111, "@beyonce"], [76, "@nike"]]) ## One-liner superstars = inst[inst[:,0].astype(float) > 100, 1] ## Results print(superstars)
You can compute the result of this one-liner in your head, can’t you?
The data consists of a two-dimensional array where each row represents an Instagram influencer. The first column states their number of followers (in million), and the second column states their Instagram name. The question is to find the names of the Instagram influencers with more than 100 million followers.
The following one-liner is one way of solving this problem. Note that there are many more alternatives – this is just the one which I found has the least number of characters.
## One-liner superstars = inst[inst[:,0].astype(float) > 100, 1]
Let’s deconstruct this one-liner in a step by step manner.
First, we calculate a Boolean value whether each influencer has more than 100 million followers:
print(inst[:,0].astype(float) > 100) # [ True True False True True False]
The first column of the data array contains the number of followers, so we use slicing to access this data (inst[:,0] returns all rows but only the first column). However, the data array contains mixed data types (integers and strings). Therefore, NumPy automatically assigns a non-numerical data type to the array.
But as we want to perform numerical comparisons on the first column of the data array (checking whether each value is larger than 100), we first need to convert the array into a numerical type (for example float).
At this point, we check whether a NumPy array of type float is larger than an integer value. What exactly happens here? You have already learned about broadcasting: NumPy automatically brings the two operands into the same shape. Then, it compares the two equally-shaped arrays element-wise. The result is an array of Boolean values. Four influencers have more than 100 million followers.
We now take this Boolean array as an indexing array to select the influencers with more than 100 million followers (the rows).
inst[inst[:,0].astype(float) > 100, 1]
As we are only interested in the names of these influencers, we select the second row as the final result stored in the superstars variable.
The influencers with more than 100 million Instagram followers are:
# ['@instagram' '@selenagomez' '@cristiano' '@beyonce']
Learning NumPy will not only make you a better Python coder
To help you increase your value to the marketplace, I’ve written a new NumPy book — 100% based on the proven principle of puzzle-based learning.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.