Three Ways to Transform Pandas Dataframes to Arrays Effortlessly

What are Arrays?

A Python array is a data structure like a list. They contain a number of objects that can be of different data types. They provide fast ways and versatile ways to normalize data.

What are Dataframes?

Dataframes are an ordered sequence of Series, sharing the same index with labeled columns used to create and manipulate tabular data.

Method 1: to_numpy() – The Most Common

import pandas as pd
movies = pd.DataFrame({'Movies':['The Matrix Resurrections','West Side Story','SpiderMan No way Home'], 'Revenue':[7.5,3.0,2.5]})

# pay attention to the structure and detail of the dataframe
print('the data type of data is :',type(movies))
movies_df_2array = movies.to_numpy() # This methods converts the data frame into a array
print('the data type of movies_df_2array is:', type(movies_df_2array))

Output:

the data type of data is : <class 'pandas.core.frame.DataFrame'>
the data type of movies_df_2array is: <class 'numpy.ndarray'>

We created a Dataframe called movies and within that dataframe we have dictionary keys and value pairs. The keys are stored as β€˜Movies’ and ’Revenue’, the values are a list of strings and integers.

Example: β€˜The Matrix’ β€˜7.5’.

Method 2: The DataFrame.values Attribute

fake_data = pd.DataFrame({'State':['New York','California','Florida'], 
                          'City':['Manhattan','Los Angeles','Miami'], 
                          'Population':(7.5,10.5,6.2)})
fake_data.values # only cell values from the dataframe will be returned as an array

Output:

array([['New York', 'Manhattan', 7.5],
       ['California', 'Los Angeles', 10.5],
       ['Florida', 'Miami', 6.2]], dtype=object)

Rows and Columns labels have been removed from the DataFrame structure!

A similar process occurs again with us making another Dataframe called fake_date and assigning Key and Value pairs.

Examples: State,City,Population!

I think you get the idea by now.

Method 3: The Series.array Attribute – The Least Common

one_dimensional_data = pd.Series([1,2,3,4,5])
'''
One_dimensional_data
0    1
1    2
2    3
3    4
4    5
dtype: int64
'''

new_array_from_series = one_dimensional_data.array 
'''
[1, 2, 3, 4, 5]
Length: 5, dtype: int64
'''

When using the .array attribute make sure you are working with 1-dimensional data or get errors!

Conclusion

These are the 3 most common ways to transform DataFrames to NumPy arrays.

πŸ’‘ Remember Numpy is essential to the Data Science World. They make it easy to calculate the position of each element and perform vectorized operations that make computations fast and efficient. We can slice, reshape, join and split arrays!