Imagine you need to find a book in your bookshelf. What situation would you prefer: A) your bookshelf contains all your books in no specific order, or B) your bookshelf contains all books alphabetically sorted by title.
Of course, option B) would save you a lot of time – especially if you access your bookshelf multiple times. This article will show you how to use sorting in a single line of Python using the NumPy library. The article is remotely based on book chapters from my book “Coffee Break NumPy” and my upcoming book “Python One-liners”. 😊
Sorting is at the heart of more advanced applications such as commercial computing, graph traversal, or search algorithms. Fortunately, NumPy provides different search algorithms – the default search algorithm being the popular “Quicksort” algorithm. However, for this one-liner, we take a higher-level approach viewing the sorting function as a “black box” where we can put in a NumPy array and get out a sorted NumPy array.
The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted array [1, 2, 4, 5, 6, 8, 9, 10]. This is the purpose of NumPy’s sort() function.
But oftentimes it is not only important to sort the array itself, but also to get the array of indices that would transform the unsorted array into a sorted array. For example, the array element “1” of the unsorted array has index “7”. Since the array element “1” is the first element of the sorted array, its index “7” is the first element of the sorted indices. This is the purpose of NumPy’s argsort() function.
This small code snippet demonstrates how you would use sort() and argsort() in NumPy:
import numpy as np a = np.array([10, 6, 8, 2, 5, 4, 9, 1]) print(np.sort(a)) # [ 1 2 4 5 6 8 9 10] print(np.argsort(a)) # [7 3 5 4 1 2 6 0]
You may ask: how is NumPy’s sort() function different to Python’s sorted() function? The answer is simple: you can use NumPy to sort multi-dimensional arrays, too!
The Figure shows two ways of how to use the sorting function to sort a two-dimensional array. The array to be sorted has two axes: axis 0 (the rows) and axis 1 (the columns). Now, you can sort along axis 0 (vertically sorted) or along axis 1 (horizontally sorted). In general, the axis keyword defines the direction along which you perform the NumPy operation. Here is the code snippet that shows technically how to do this:
import numpy as np a = np.array([[1, 6, 2], [5, 1, 1], [8, 0, 1]]) print(np.sort(a, axis=0)) """ [[1 0 1] [5 1 1] [8 6 2]] """ print(np.sort(a, axis=1)) """ [[1 2 6] [1 1 5] [0 1 8]] """
The example shows that the optional axis argument helps you sort the NumPy array along a fixed direction. This is the main strength of NumPy’s sort() function compared to Python’s built-in sorted() function.
The one-liner solves the following problem: “Find the names of the top three students with highest SAT scores.” Note that simply sorting an array of SAT scores does not solve the problem because the problem asks for the names of the students. Have a look at the data first and then try to find the one-liner solution yourself.
## Dependencies import numpy as np ## Data: SAT scores for different students sat_scores = np.array([1100, 1256, 1543, 1043, 989, 1412, 1343]) students = np.array(["John", "Bob", "Alice", "Joe", "Jane", "Frank", "Carl"]) ## One-liner top_3 = students[np.argsort(sat_scores)][:3:-1] ## Result print(top_3)
What’s the output of this code snippet?
Initially, the code defines the data consisting of the SAT scores of students as a one-dimensional data array, as well as the names of these students. For example, student “John” achieved a SAT score of “1100”, while “Frank” achieved a SAT score of “1343”.
The question is to find the names of the three most successful students. The one-liner achieves this objective – not by simply sorting the SAT scores – but by running the argsort() function. Recall that the argsort() function returns an array of indices such that the respective data array elements would be sorted.
Here is the output of the argsort function on the SAT scores:
print(np.argsort(sat_scores)) # [4 3 0 1 6 5 2]
Why is the index “4” at the first position of the output? Because student “Jane” has the lowest SAT score with 989 points. Note that both sort() and argsort() sort in an ascending manner from lowest to highest values.
You have the sorted indices but what now? The idea is to get the names of the respective students. Now, this can be achieved by using simple indexing on the student’s name array:
print(students[np.argsort(sat_scores)]) # ['Jane' 'Joe' 'John' 'Bob' 'Carl' 'Frank' 'Alice']
You already know that “Jane” has the lowest SAT score, while “Alice” has the highest SAT score. The only thing left is to reorder this list (from highest to lowest) and extract the top three students using simple slicing:
## One-liner top_3 = students[np.argsort(sat_scores)][:3:-1] ## Result print(top_3) # ['Alice' 'Frank' 'Carl']
Alice, Frank, and Carl are the students with the highest SAT scores 1543, 1412, and 1343, respectively.
What to do next?
To help you become a better coder (and to overcome your weaker self), I have created my email course for continuous improvement in Python. My books “Coffee Break Python” and “Coffee Break NumPy” are based on this philosophy of small continuous improvements (Kaizen) using a puzzle-based learning approach. It’s fun!