Understanding Sorting Techniques in NumPy: argsort and lexsort

Rate this post

πŸ’‘ Problem Formulation: When working with data in Python, sorting arrays is a common task. Specifically, we need efficient methods to sort NumPy arrays based on either their own values or the values in a secondary array. For instance, imagine you have an array of scores and another of names; you want to sort the scores, maintaining the association with the correct names. This article delves into various approaches to tackle such sorting challenges.

Method 1: Using np.argsort() to Sort Arrays

Numpy’s argsort() function is used to return the indices that would sort an array along the specified axis. This method is highly useful when you need the order of indices to arrange the elements of an array and maintain structure with other related arrays. It’s a single-dimensional sorting approach.

Here’s an example:

import numpy as np

scores = np.array([80, 95, 70, 100, 90])
sorted_indices = np.argsort(scores)

print(sorted_indices)

Output:

[2, 0, 4, 1, 3]

This code snippet first creates an array of scores. The np.argsort() function then returns indices of the scores in ascending order. For instance, the score 70 has the index 2 and is the smallest, thus the first in the sorted indices. This method allows us to sort arrays and use the indices to sort other related arrays.

Method 2: Sorting with Stability Using np.sort() and np.argsort()

Combining the sort() function with argsort() gives a stable sort, preserving the order of equal elements as in the original array. It’s beneficial when equal items have semantic ordering that should be retained.

Here’s an example:

import numpy as np

names = np.array(['Alice', 'Bob', 'Charlie', 'David'])
scores = np.array([100, 100, 95, 90])
sorted_indices = np.argsort(scores, kind='stable')

sorted_names = names[sorted_indices]

print(sorted_names)

Output:

['David', 'Charlie', 'Alice', 'Bob']

This snippet utilizes np.argsort() with a ‘stable’ sorting algorithm which is especially important when sorting arrays like scores where the same score doesn’t mean the associated names should switch places. Thus, ‘Alice’ and ‘Bob’ retain their positional order post-sorting.

Method 3: Multidimensional Sorting with np.lexsort()

The np.lexsort() function sorts multiple arrays based on the last key passed. It is very useful when you want to sort by primary, secondary criteria, and so forth, similar to lexicographical sorting for multidimensional data.

Here’s an example:

import numpy as np

last_names = np.array(['Smith', 'Doe', 'Jones', 'Smith'])
first_names = np.array(['John', 'Jane', 'Ted', 'Alice'])
sorted_indices = np.lexsort((first_names, last_names))

print(np.column_stack((last_names[sorted_indices], first_names[sorted_indices])))

Output:

[['Doe' 'Jane']
 ['Jones' 'Ted']
 ['Smith' 'Alice']
 ['Smith' 'John']]

The code sorts an array of names by last and then first names. The np.lexsort() uses the last argument, ‘last_names’, for primary order, then ‘first_names’ for secondary order. We stack the sorted arrays column-wise to display sorted pairs.

Method 4: Sorting with np.argsort() in Descending Order

Sometimes, the goal is to sort arrays in descending order. By combining argsort() with array slicing, we can reverse the sort order easily.

Here’s an example:

import numpy as np

scores = np.array([80, 95, 70, 100, 90])
sorted_indices_desc = np.argsort(-scores)

print(sorted_indices_desc)

Output:

[3, 1, 4, 0, 2]

By negating the ‘scores’ array before passing it to np.argsort(), the function sorts as if in descending order, but without altering the original data. The indices can then be used to reorder other associated arrays if needed.

Bonus One-Liner Method 5: Sorting Simplicity with Comprehensions

Python’s list comprehensions can often provide a more readable and Pythonic way of sorting arrays based on other arrays, especially when dealing with small datasets or when performance is not a critical concern.

Here’s an example:

import numpy as np

names = np.array(['Bob', 'Alice', 'Charlie'])
scores = np.array([90, 100, 95])

sorted_names = [name for _, name in sorted(zip(scores, names), reverse=True)]

print(sorted_names)

Output:

['Alice', 'Charlie', 'Bob']

This one-liner zips scores with names, sorts the paired list by scores in descending order, and extracts the names from the sorted pairs. This method is very intuitive for those familiar with Python’s list comprehensions.

Summary/Discussion

  • Method 1: np.argsort(). Simple and straightforward. Only sorts indices, which can be a limitation if actual values sorting is needed.
  • Method 2: Stable np.sort() + np.argsort(). Ensures that original order amongst equal elements is preserved. It’s slightly more complex than basic argsort.
  • Method 3: np.lexsort(). Good for sorting based on multiple keys. Potentially confusing because it sorts keys starting from the last one.
  • Method 4: np.argsort() Descending. Intuitive way to achieve descending order sorting. Negating may not be directly applicable for non-numeric data types.
  • Method 5: Comprehensions. Pythonic, readable. May not be suitable for performance-critical tasks or very large datasets.