np.argpartition() — A Simple Illustrated Guide

5/5 - (2 votes)
np.argpartition — A Simple Illustrated Guide

In Python, the numpy.argpartition() function returns the indices that would partition an array along with a given axis based on the specified kth element(s).

All elements smaller than the kth element will be moved before it and all larger elements behind it. The element order in the partitions is undefined.

If provided with a sequence of kth, it will partition all of them into their sorted position at once.

Here is the argument table of np.argpartition().

If it sounds great to you, please continue reading, and you will fully understand the np.argpartition() function through Python code snippets and vivid visualization.

This tutorial is about np.argpartition() function. 

  • Concretely, I will introduce its syntax and arguments. 
  • Then, you will learn some basic examples of this function.
  • Finally, I will address three top questions about np.argpartition(), including np.argpartition 2d array/axis, np.argpartition order, and np.argpartition ignore np.nan.

You can find all codes in this tutorial here.

Syntax and Arguments

Here is the syntax of np.argpartition():

# Syntax
numpy.argpartition(a, kth[, axis=- 1[, 
                   kind='introselect'[, order=None]]])

Here is the argument table of np.argpartition():

ArgumentAcceptDescription
aarray_likeArray to sort.
kthint or sequence of intsElement index to partition by. All smaller elements will be moved before it and all larger elements behind it. The element order in the partitions is undefined. If provided with a sequence of k-th, it will partition all of them into their sorted position at once.
axisint or None, optionalAxis along which to sort. The default is -1 (the last axis). If None, the flattened array is used.
kind{'introselect'}, optionalSelection algorithm. Default is 'introselect'
orderstr or list of str, optionalThis argument specifies the order in which to compare fields.
A single field can be specified as a string, and not all fields need to be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.

Don’ panic! I wil explain the kth, axis, and order argument clearly later!

The output of np.argpartition() function is an index_array. If the input array a is one-dimensional, a[index_array] yields a partitioned a. Generally, np.take_along_axis(a, index_array, axis=a) always yields the partitioned a, irrespective of dimensionality.

Basic Examples

Here is a one-dimensional array code example:

# Basic Example
import numpy as np
one_dim = np.array([2, 3, 1, 5, 4])
# kth=0 -> partition based on 2(zero index).
partitioned = np.argpartition(one_dim, 0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')

Output:

As you can see, the partitioned array does not guarantee the element order in the partitions. It just separates elements into groups based on the kth argument!

This is a way to do partial sort and has an efficient time complexity O(n).

If you are more interested in sorting array as a whole, you might want to check out the numpy.argsort() function and here is my tutorial on numpy.argsort().

np.argpartition() 2d array/axis

In this part, I will show you how to deploy the argument axis with 2d array. Of course, you can extrapolate this into an n-dimensional array!

Just for your quick reference, here is the argument table of np.argpartition():

Here is the 2d array code example:

import numpy as np

# axis = 0 [partial sort along the axis = 0. In this case, row-like.]
print('axis = 0')
two_dim = np.array([[1, 4, 3], [3, 2, 1]])
partitioned = np.argpartition(two_dim, kth=0, axis=0)
print(f'Unpartitioned array: {two_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {np.take_along_axis(two_dim, partitioned, axis=0)}')

# axis = 1 [partial sort along the axis = 1. In this case, column-like.]
print('-'*85)
print('axis = 1')
two_dim = np.array([[1, 4, 3], [3, 2, 1]])
partitioned = np.argpartition(two_dim, kth=0, axis=1)
print(f'Unpartitioned array: {two_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {np.take_along_axis(two_dim, partitioned, axis=1)}')

Output:

np.argpartition() order

In this part, I will show you another exciting argument, order.

Just for your quick reference, here is the argument table of np.argpartition():

Here is a one-dimensional array code example:

import numpy as np

# order: x -> y
print('order: x -> y')
one_dim = np.array([(1, 3), (2, 1), (1, 1), (2, 3)],
                   dtype=[('x', float), ('y', float)])
partitioned = np.argpartition(one_dim, kth=0, order=['x', 'y'])
# In this case, same as:
# partitioned = np.argpartition(one_dim, kth=0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')


# order: y -> x
print('-'*85)
print('order: y -> x')
one_dim = np.array([(1, 3), (2, 1), (1, 1), (2, 3)],
                   dtype=[('x', float), ('y', float)])
partitioned = np.argpartition(one_dim, kth=0, order=['y', 'x'])
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')

Output:

np.argpartition() ignore np.nan

When we encounter tricky some np.nan in a sorting problem, we can exclude them by counting the number of np.nan and leaving them out through indexing.

Here is a case of getting the N biggest element and index in an array containing np.nan:

import numpy as np

# * Do not handle np.nan problem:
print('Do not handle np.nan problem:')
one_dim = np.array([2, 3, 1, np.nan, 5, 4, np.nan])
# kth=0 -> partition based on 2(zero index).
partitioned = np.argpartition(one_dim, 0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')

# * Handle np.nan problem:
# np.nan is taken as an infinite large number in Python.
print('-'*85)
print('Handle np.nan problem:')
one_dim = np.array([2, 3, 1, np.nan, 5, 4, np.nan])
# If we want to get the N biggest element and index in an array:
print('(If we want to get the N biggest element and index in an array)')
N = 2
# get the number of np.nan
c = np.isnan(one_dim).sum()
partitioned_idx = np.argpartition(one_dim, -N-c)[-N-c:-c]
partitioned_val = one_dim[partitioned_idx]
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned_idx}')
print(f'Partitioned array: {partitioned_val}')

Output:

Summary

That’s it for our np.argpartition() article. 

We learned about its syntax, arguments, and basic examples. 

We also worked on the top three questions about the np.argpartition() function, ranging from np.argpartition 2d array/axis, np.argpartition order, and np.argpartition ignore np.nan

By the way, if you want to do a direct partial sort, please refer to the numpy.partition() function. I believe that after reading this tutorial, you can understand numpy.partition() easily!

Hope you enjoy all this and happy coding!