## Problem Formulation and Solution Overview

*Often, a sub-set of data needs to be extracted from a larger dataset. This sub-set could be a pre-determined number of column(s) or row(s). These examples show you how to extract this data.*

## Preparation

Before moving forward, please ensure the NumPy library is installed on the computer. Click here if you require instructions.

Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import numpy as np

After importing the NumPy library, we can reference this library by calling the shortcode (`np`

) as shown above.

## Method 1: Use np.array() and slicing

This NumPy method uses `sl`

i`cing`

to extract a specific subset from a data set. The code below can be used in a production environment with an extensive data set.

data = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]) subset = data[:, 1:6:2] print(subset)

Above, an `np.array()`

function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to `data`

.

Next, a subset of the above `data`

is extracted containing **all rows** and **columns **1, 3, and 5 using slicing (`data[:, 1:6:2]`

) as follows:

- All
**rows**of`data`

are extracted by calling`data[: ]`

. - A comma (
`,`

) is placed to separate the slicing. In this case, to separate**row**extraction [:] from**column**extraction`[1:6:2]`

. - The extraction starts from column 1 to column 5 (stop-1), skipping every 2nd column. Once the stop position (
`6-1`

) is attained, the slicing is complete and saved to`subset`

.

The results are output to the terminal.

`[[ 1 3 5]` |

Comparing this output to the original `np.array()`

shows you how easy slicing is! A truly Pythonic approach!

## Method 2: use np.array() and np.ix_

This NumPy method uses the `np.array()`

and `np.ix_`

functions and slicing to extract a subset of rows and columns from a data set. This option can also be used in a production environment with an extensive data set.

data = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]) subset = data[np.ix_([2,3], [2,5])] print(subset)

Above, an `np.array()`

function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to `data`

.

Next, the data is extracted using slicing, saved to `subset`

and output to the terminal.

`[[22 25]` |

## Step 3: Use np.array() and np.arange()

This NumPy method uses the `np.array()`

and `np.arange()`

to extract a subset from a data set. This option only works with a 1D (one-dimensional) NumPy array.

data = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) subset = np.arange(3, 10, 3) print(subset)

Above, an `np.array()`

function is used to declare a 1D (one-dimensional) NumPy array containing a small sampling of integers. This saves to `data`

.

Next, `np.arange()`

is used and passed the following arguments:

- The start position of 3.
- The stop position (stop-1) of 9.
- The step position of 3.

The results are output to the terminal.

`[3 6 9]` |

## Method 4: Use Use `np.array()`

, `np.reshape()`

and `slicing`

This NumPy method uses Use `np.array()`

, `np.reshape()`

and slicing to extract a subset from a data set. This option can also be used in a production environment with an extensive data set.

data = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]) reworked = np.arange(25).reshape(5,5) print(reworked) subset = reworked[:,3] print(subset)

Above, an `np.array()`

function is used to declare a 2D (two-dimensional) NumPy array containing a small sampling of integers. This saves to `data`

.

Next, `data`

is reshaped and output to the terminal to display the new transformation (5 arrays. Each containing 5 elements).

`[[ 0 1 2 3 4]` |

From the reshaped `data`

, element [3] is extracted from each array using slicing and then saved to a 1D array `subset`

and output to the terminal.

` [ 3 8 13 18 23]` |

## Bonus

We have a CSV file containing five (5) sample users from the Finxter Academy. These columns are an ID, puzzles solved correctly and incorrectly. Using NumPy, how could extract a subset of this data?

**Contents of scores.csv**

`30022145,1915,68` |

csv = np.loadtxt('scores.csv', delimiter=',', dtype=int) csv = csv.reshape(3,5) print(csv) subset = csv[:, 4] print(subset)

Above, uses `np.loadtxt()`

and passes it the following arguments:

- The CSV file to read in. In this case,
`scores.csv`

. - The field delimiter. In this case, a comma (
`,`

). - Set the data type to integers (
`dtype-int`

).

Next, `csv.reshape()`

is called and passed two (2) arguments:

- The total number of columns in the CSV file (
`3`

). - The total number of rows in the CSV file (
`5`

).

Then, the reshaped NumPy array is output to the terminal.

`[[30022145 1915 68 30022192 1001]` |

However, we want to extract a subset of this data. In this regard, the following line `csv[:, 4]`

uses slicing to extract the data and save it to `subset`

.

`[ 1001 30022345 47]` |

## Summary

## Regex Humor

At university, I found my love of writing and coding. Both of which I was able to use in my career.

During the past 15 years, I have held a number of positions such as:

In-house Corporate Technical Writer for various software programs such as Navision and Microsoft CRM

Corporate Trainer (staff of 30+)

Programming Instructor

Implementation Specialist for Navision and Microsoft CRM

Senior PHP Coder