In Python, the numpy.argpartition()
function returns the indices that would partition an array along with a given axis based on the specified kth element(s).
All elements smaller than the kth element will be moved before it and all larger elements behind it. The element order in the partitions is undefined.
If provided with a sequence of kth, it will partition all of them into their sorted position at once.
Here is the argument table of np.argpartition()
.
If it sounds great to you, please continue reading, and you will fully understand the np.argpartition()
function through Python code snippets and vivid visualization.
This tutorial is about np.argpartition()
function.
- Concretely, I will introduce its syntax and arguments.
- Then, you will learn some basic examples of this function.
- Finally, I will address three top questions about
np.argpartition()
, includingnp.argpartition
2d array/axis, np.argpartition order, andnp.argpartition
ignorenp.nan
.
You can find all codes in this tutorial here.
Syntax and Arguments
Here is the syntax of np.argpartition()
:
# Syntax numpy.argpartition(a, kth[, axis=- 1[, kind='introselect'[, order=None]]])
Here is the argument table of np.argpartition()
:
Argument | Accept | Description |
a | array_like | Array to sort. |
kth | int or sequence of ints | Element index to partition by. All smaller elements will be moved before it and all larger elements behind it. The element order in the partitions is undefined. If provided with a sequence of k-th, it will partition all of them into their sorted position at once. |
axis | int or None , optional | Axis along which to sort. The default is -1 (the last axis). If None , the flattened array is used. |
kind | {'introselect'} , optional | Selection algorithm. Default is 'introselect' |
order | str or list of str , optional | This argument specifies the order in which to compare fields. A single field can be specified as a string, and not all fields need to be specified, but unspecified fields will still be used, in the order in which they come up in the dtype , to break ties. |
Donβ panic! I wil explain the kth, axis, and order argument clearly later!
The output of np.argpartition()
function is an index_array
. If the input array a
is one-dimensional, a[index_array]
yields a partitioned a
. Generally, np.take_along_axis(a, index_array, axis=a)
always yields the partitioned a
, irrespective of dimensionality.
Basic Examples
Here is a one-dimensional array code example:
# Basic Example import numpy as np one_dim = np.array([2, 3, 1, 5, 4]) # kth=0 -> partition based on 2(zero index). partitioned = np.argpartition(one_dim, 0) print(f'Unpartitioned array: {one_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {one_dim[partitioned]}')
Output:
As you can see, the partitioned array does not guarantee the element order in the partitions. It just separates elements into groups based on the kth argument!
This is a way to do partial sort and has an efficient time complexity O(n).
If you are more interested in sorting array as a whole, you might want to check out the numpy.argsort()
function and here is my tutorial on numpy.argsort().
np.argpartition() 2d array/axis
In this part, I will show you how to deploy the argument axis with 2d array. Of course, you can extrapolate this into an n-dimensional array!
Just for your quick reference, here is the argument table of np.argpartition()
:
Here is the 2d array code example:
import numpy as np # axis = 0 [partial sort along the axis = 0. In this case, row-like.] print('axis = 0') two_dim = np.array([[1, 4, 3], [3, 2, 1]]) partitioned = np.argpartition(two_dim, kth=0, axis=0) print(f'Unpartitioned array: {two_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {np.take_along_axis(two_dim, partitioned, axis=0)}') # axis = 1 [partial sort along the axis = 1. In this case, column-like.] print('-'*85) print('axis = 1') two_dim = np.array([[1, 4, 3], [3, 2, 1]]) partitioned = np.argpartition(two_dim, kth=0, axis=1) print(f'Unpartitioned array: {two_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {np.take_along_axis(two_dim, partitioned, axis=1)}')
Output:
np.argpartition() order
In this part, I will show you another exciting argument, order.
Just for your quick reference, here is the argument table of np.argpartition()
:
Here is a one-dimensional array code example:
import numpy as np # order: x -> y print('order: x -> y') one_dim = np.array([(1, 3), (2, 1), (1, 1), (2, 3)], dtype=[('x', float), ('y', float)]) partitioned = np.argpartition(one_dim, kth=0, order=['x', 'y']) # In this case, same as: # partitioned = np.argpartition(one_dim, kth=0) print(f'Unpartitioned array: {one_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {one_dim[partitioned]}') # order: y -> x print('-'*85) print('order: y -> x') one_dim = np.array([(1, 3), (2, 1), (1, 1), (2, 3)], dtype=[('x', float), ('y', float)]) partitioned = np.argpartition(one_dim, kth=0, order=['y', 'x']) print(f'Unpartitioned array: {one_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {one_dim[partitioned]}')
Output:
np.argpartition() ignore np.nan
When we encounter tricky some np.nan
in a sorting problem, we can exclude them by counting the number of np.nan
and leaving them out through indexing.
Here is a case of getting the N biggest element and index in an array containing np.nan
:
import numpy as np # * Do not handle np.nan problem: print('Do not handle np.nan problem:') one_dim = np.array([2, 3, 1, np.nan, 5, 4, np.nan]) # kth=0 -> partition based on 2(zero index). partitioned = np.argpartition(one_dim, 0) print(f'Unpartitioned array: {one_dim}') print(f'Partitioned array index: {partitioned}') print(f'Partitioned array: {one_dim[partitioned]}') # * Handle np.nan problem: # np.nan is taken as an infinite large number in Python. print('-'*85) print('Handle np.nan problem:') one_dim = np.array([2, 3, 1, np.nan, 5, 4, np.nan]) # If we want to get the N biggest element and index in an array: print('(If we want to get the N biggest element and index in an array)') N = 2 # get the number of np.nan c = np.isnan(one_dim).sum() partitioned_idx = np.argpartition(one_dim, -N-c)[-N-c:-c] partitioned_val = one_dim[partitioned_idx] print(f'Unpartitioned array: {one_dim}') print(f'Partitioned array index: {partitioned_idx}') print(f'Partitioned array: {partitioned_val}')
Output:
Summary
Thatβs it for our np.argpartition()
article.
We learned about its syntax, arguments, and basic examples.
We also worked on the top three questions about the np.argpartition()
function, ranging from np.argpartition
2d array/axis, np.argpartition
order, and np.argpartition
ignore np.nan
.
By the way, if you want to do a direct partial sort, please refer to the numpy.partition()
function. I believe that after reading this tutorial, you can understand numpy.partition()
easily!
Hope you enjoy all this and happy coding!