π‘ Problem Formulation: Converting a Pandas DataFrame to a list in Python is a common operation when you want to reduce the dimensionality of your data for tasks such as serialization, efficient data transfer, or simple list manipulations. For instance, you might have a DataFrame df
with some columns, and you want to transform it into a list of lists, where each inner list represents a row from the DataFrame.
Method 1: Using values.tolist()
This method involves using the values
attribute to get a NumPy representation of the DataFrame, followed by the tolist()
method, which converts the NumPy array to a list of lists. Itβs a straight-forward and efficient method for dataframes with homogeneous data types.
Here’s an example:
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) list_of_lists = df.values.tolist() print(list_of_lists)
Output:
[[1, 4], [2, 5], [3, 6]]
This code snippet creates a DataFrame from a dictionary and converts it into a list of lists, with each inner list representing a DataFrame row. The values
attribute gives a NumPy array, and tolist()
method converts that array into a standard Python list.
Method 2: Using to_numpy()
and Tolist
The to_numpy()
method is a newer addition to Pandas that explicitly converts DataFrame elements to a NumPy array before the standard tolist()
method is called. This can be more explicit and clear in intention than using values
.
Here’s an example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) list_of_lists = df.to_numpy().tolist() print(list_of_lists)
Output:
[[1, 4], [2, 5], [3, 6]]
In this example, to_numpy()
is used to convert the DataFrame into a NumPy array, which is then converted to a list of lists using the tolist()
method, yielding the same result as values.tolist()
.
Method 3: Using apply()
with axis=1
Another method involves using a DataFrame’s apply()
function with axis=1
to iterate over rows. Inside the apply()
method, a lambda function converts each row to a list, thus yielding a Series of lists that can be easily converted to a list of lists.
Here’s an example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) list_of_lists = df.apply(lambda x: x.tolist(), axis=1).tolist() print(list_of_lists)
Output:
[[1, 4], [2, 5], [3, 6]]
The apply()
function with a lambda expression processes each row, and tolist()
converts the resulting Series of lists into a list of lists.
Method 4: Using iterrows()
By using the iterrows()
method, we can iterate over DataFrame rows directly. Each row is represented by a Pandas Series during iteration, which can be converted to a list, and then appended to the main list.
Here’s an example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) list_of_lists = [row.tolist() for index, row in df.iterrows()] print(list_of_lists)
Output:
[[1, 4], [2, 5], [3, 6]]
Here, the iterrows()
method generates index and row pairs, which we ignore the index and convert each row Series to a list using list comprehension.
Bonus One-Liner Method 5: Using List Comprehension with iloc
A one-liner solution utilizing list comprehension with iloc
enables direct access to the DataFrame’s rows. The iloc
attribute retrieves rows at the integer location, which can be converted to a list.
Here’s an example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) list_of_lists = [df.iloc[i].tolist() for i in range(len(df))] print(list_of_lists)
Output:
[[1, 4], [2, 5], [3, 6]]
This example utilizes list comprehension to iterate through index positions and accesses each row with iloc
, immediately converting each row to a list.
Summary/Discussion
- Method 1: Using
values.tolist()
. Simple and fast. Best for homogeneous data types. May not be the most explicit method for Pandas newcomers. - Method 2: Using
to_numpy().tolist()
. Explicit and clear, especially for those familiar with NumPy. Essentially identical to Method 1 in output. - Method 3: Using
apply()
. Good for complex row-wise transformations. Slightly more verbose and may be overkill for simple conversions. - Method 4: Using
iterrows()
. Direct iteration over rows. Can be slow for large datasets as it returns a Series for each row. - Bonus Method 5: Using list comprehension with
iloc
. Compact one-liner. Offers direct access to dataframe rows but can be less efficient than other methods for large datasets.