What are Arrays?
A Python array is a data structure like a list. They contain a number of objects that can be of different data types. They provide fast ways and versatile ways to normalize data.
What are Dataframes?
Dataframes are an ordered sequence of Series, sharing the same index with labeled columns used to create and manipulate tabular data.
Method 1: to_numpy() – The Most Common
import pandas as pd movies = pd.DataFrame({'Movies':['The Matrix Resurrections','West Side Story','SpiderMan No way Home'], 'Revenue':[7.5,3.0,2.5]}) # pay attention to the structure and detail of the dataframe print('the data type of data is :',type(movies)) movies_df_2array = movies.to_numpy() # This methods converts the data frame into a array print('the data type of movies_df_2array is:', type(movies_df_2array))
Output:
the data type of data is : <class 'pandas.core.frame.DataFrame'> the data type of movies_df_2array is: <class 'numpy.ndarray'>
We created a Dataframe called movies
and within that dataframe we have dictionary keys and value pairs. The keys are stored as βMovies
β and βRevenue
β, the values are a list of strings and integers.
Example: βThe Matrixβ β7.5β
.
Method 2: The DataFrame.values Attribute
fake_data = pd.DataFrame({'State':['New York','California','Florida'], 'City':['Manhattan','Los Angeles','Miami'], 'Population':(7.5,10.5,6.2)}) fake_data.values # only cell values from the dataframe will be returned as an array
Output:
array([['New York', 'Manhattan', 7.5], ['California', 'Los Angeles', 10.5], ['Florida', 'Miami', 6.2]], dtype=object)
Rows and Columns labels have been removed from the DataFrame structure!
A similar process occurs again with us making another Dataframe called fake_date
and assigning Key and Value pairs.
Examples: State,City,Population
!
I think you get the idea by now.
Method 3: The Series.array Attribute – The Least Common
one_dimensional_data = pd.Series([1,2,3,4,5]) ''' One_dimensional_data 0 1 1 2 2 3 3 4 4 5 dtype: int64 ''' new_array_from_series = one_dimensional_data.array ''' [1, 2, 3, 4, 5] Length: 5, dtype: int64 '''
When using the .array
attribute make sure you are working with 1-dimensional data or get errors!
Conclusion
These are the 3 most common ways to transform DataFrames to NumPy arrays.
π‘ Remember Numpy is essential to the Data Science World. They make it easy to calculate the position of each element and perform vectorized operations that make computations fast and efficient. We can slice, reshape, join and split arrays!