How to Convert Pandas DataFrame/Series to NumPy Array?

5/5 - (3 votes)

πŸ’¬ Programming Challenge: Given a Pandas DataFrame or a Pandas Series object. How to convert them to a NumPy array?

How to Convert Pandas DataFrame/Series to NumPy Array?

In this short tutorial, you’ll learn (1) how to convert a 1D pandas Series to a NumPy array, and (2) how to convert a 2D pandas DataFrame to an array. Let’s get started with the first! πŸ‘‡

Convert Pandas Series to NumPy Array

First, let’s create a Pandas Series.

import pandas as pd

# create dataframe df 
df = pd.Series([22,21,20,14],
              name= 'GSTitles',
              index= ['Nadal','Djokovic','Federer','Sampras'])
print(df)

Here’s the resulting Series df:

Nadal       22
Djokovic    21
Federer     20
Sampras     14
Name: GSTitles, dtype: int64

Now that we have our Pandas Series, you can convert this to a NumPy Array using the DataFrame.to_numpy() method.

Like so:

print(df.to_numpy())
# [22 21 20 14]

The resulting object is a NumPy array:

print(type(df.to_numpy()))
# <class 'numpy.ndarray'>

⚑ Attention: There is also the .values() method, but that is being deprecated now – when you look at the Pandas documentation, there is a warning “We recommend using DataFrame.to_numpy instead”.

With this method, only the values in the DataFrame or Series will return. The index labels will be removed.

Here’s how that’ll work:

print(df.values)
# [22 21 20 14]

This was a 1-dimensional array or a Series. Let’s move on to the 2D case next. πŸ‘‡πŸ‘‡πŸ‘‡

Convert DataFrame to NumPy Array

πŸ’¬ Question: Let’s try with a two-dimensional DataFrame — how to convert it to a NumPy array?

First, let’s print the dimension of the previous Series to confirm that it was, indeed, a 1D data structure:

print(df.ndim)
# 1

Next, you create a 2D DataFrame object:

import pandas as pd


# Create a 2D DataFrame object
df2 = pd.DataFrame(data={'Nadal': [2, 14, 2, 4],
                         'Djokovic': [9, 2, 7, 3],
                         'Federer': [6, 1, 8, 5],
                         'Sampras': [2, 0, 7, 5]}, 
                  index=['AO', 'F', 'W', 'US'])

print(df2)

Here’s the resulting DataFrame:

NadalDjokovicFedererSampras
AO2962
F14210
W2787
US4355

Now, let’s dive into the conversion of this DataFrame to a NumPy array by using the DataFrame.to_numpy() method.

# Convert this DataFrame to a NumPy array
print(df2.to_numpy())

The output shows a NumPy array from the 2D DataFrame — great! πŸ‘Ύ

[[ 2  9  6  2]
 [14  2  1  0]
 [ 2  7  8  7]
 [ 4  3  5  5]]

You can see that all indexing metadata has been stripped away from the resulting NumPy array!

Convert Specific Columns from DataFrame to NumPy Array

You can also convert specific columns of a Pandas DataFrame by accessing the columns using pandas indexing and calling the .to_numpy() method on the resulting view object.

Here’s an example:

print(df2[['Djokovic', 'Federer']].to_numpy())

The output:

[[9 6]
 [2 1]
 [7 8]
 [3 5]]

Summary

You can convert a Pandas DataFrame or a Pandas Series object to a NumPy array by means of the df.to_numpy() method. The indexing metadata will be removed.

You can also convert specific columns of a Pandas DataFrame by accessing the columns using pandas indexing and calling the .to_numpy() method on the resulting view object.


Thanks for reading through the whole tutorial! πŸ™‚