How to Extract Specific NumPy Columns? 5 Best Ways

Problem Formulation and Solution Overview

In this article, you’ll learn how to extract specific columns or a sub-set thereof from a NumPy array in Python.

Often, a sub-set of data needs to be extracted from a larger dataset. This sub-set could be a pre-determined number of column(s) or row(s). These examples show you how to extract this data.

💬 Question: How would we write code to accomplish this?

We can accomplish this task by one of the following options:

Method 1: Use np.array() and slicing
Method 2: Use np.array() and np.ix_
Method 3: Use np.array() and np.arange()
Method 4: Use np.array(), np.reshape() and slicing
Bonus: np.loadtxt(), np.reshape() and slicing

Preparation

Before moving forward, please ensure the NumPy library is installed on the computer. Click here if you require instructions.

Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import numpy as np

After importing the NumPy library, we can reference this library by calling the shortcode (np) as shown above.

Method 1: Use np.array() and slicing

This NumPy method uses slicing to extract a specific subset from a data set. The code below can be used in a production environment with an extensive data set.

data = np.array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
                 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
                 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
                 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

subset = data[:, 1:6:2] 
print(subset)

Above, an np.array() function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to data.

Next, a subset of the above data is extracted containing all rows and columns 1, 3, and 5 using slicing (data[:, 1:6:2]) as follows:

All rows of data are extracted by calling data[: ].
A comma (,) is placed to separate the slicing. In this case, to separate row extraction [:] from column extraction [1:6:2].
The extraction starts from column 1 to column 5 (stop-1), skipping every 2nd column. Once the stop position (6-1) is attained, the slicing is complete and saved to subset.

The results are output to the terminal.

[[ 1 3 5]
[11 13 15]
[21 23 25]
[31 33 35]]

Comparing this output to the original np.array() shows you how easy slicing is! A truly Pythonic approach!

Method 2: use np.array() and np.ix_

This NumPy method uses the np.array() and np.ix_ functions and slicing to extract a subset of rows and columns from a data set. This option can also be used in a production environment with an extensive data set.

data = np.array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
                 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
                 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
                 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

subset = data[np.ix_([2,3], [2,5])] 
print(subset)

Above, an np.array() function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to data.

Next, the data is extracted using slicing, saved to subset and output to the terminal.

[[22 25]
 [32 35]]

Step 3: Use np.array() and np.arange()

This NumPy method uses the np.array() and np.arange() to extract a subset from a data set. This option only works with a 1D (one-dimensional) NumPy array.

data = np.array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9])
subset = np.arange(3, 10, 3)
print(subset)

Above, an np.array() function is used to declare a 1D (one-dimensional) NumPy array containing a small sampling of integers. This saves to data.

Next, np.arange() is used and passed the following arguments:

The start position of 3.
The stop position (stop-1) of 9.
The step position of 3.

The results are output to the terminal.

[3 6 9]

Method 4: Use Use `np.array()`, `np.reshape()` and `slicing`

This NumPy method uses Use np.array(), np.reshape() and slicing to extract a subset from a data set. This option can also be used in a production environment with an extensive data set.

data = np.array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
                 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
                 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
                 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

reworked = np.arange(25).reshape(5,5)
print(reworked)

subset = reworked[:,3]
print(subset)

Above, an np.array() function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to data.

Next, data is reshaped and output to the terminal to display the new transformation (5 arrays. Each containing 5 elements).

[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]

From the reshaped data, element [3] is extracted from each array using slicing and then saved to a 1D array subset and output to the terminal.

[ 3 8 13 18 23]

Bonus

We have a CSV file containing five (5) sample users from the Finxter Academy. These columns are an ID, puzzles solved correctly and incorrectly. Using NumPy, how could extract a subset of this data?

Contents of scores.csv

30022145,1915,68
30022192,1001,45
30022331,158,9
30022345,1415,23
30022359,1950,47

csv = np.loadtxt('scores.csv', delimiter=',', dtype=int)
csv = csv.reshape(3,5)
print(csv)

subset = csv[:, 4]
print(subset)

Above, uses np.loadtxt() and passes it the following arguments:

The CSV f i le to read in. In this case, scores.csv.
The field delimiter. In this case, a comma (,).
Set the data type to integers (dtype-int).

Next, csv.reshape() is called and passed two (2) arguments:

The total number of columns in the CSV file (3).
The total number of rows in the CSV file (5).

Then, the reshaped NumPy array is output to the terminal.

[[30022145 1915 68 30022192 1001]
[ 45 30022331 158 9 30022345]
[ 1415 23 30022359 1950 47]]

However, we want to extract a subset of this data. In this regard, the following line csv[:, 4] uses slicing to extract the data and save it to subset.

[ 1001 30022345 47]

Summary

These five (5) methods of extracting data from a NumPy array should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!

Problem Formulation and Solution Overview

Preparation

Method 1: Use np.array() and slicing

Method 2: use np.array() and np.ix_

Step 3: Use np.array() and np.arange()

Method 4: Use Use np.array(), np.reshape() and slicing

Bonus

Summary

Regex Humor

Method 4: Use Use `np.array()`, `np.reshape()` and `slicing`