5 Best Ways to Convert a Pandas DataFrame to a List in Python

πŸ’‘ Problem Formulation: Converting a Pandas DataFrame to a list in Python is a common operation when you want to reduce the dimensionality of your data for tasks such as serialization, efficient data transfer, or simple list manipulations. For instance, you might have a DataFrame df with some columns, and you want to transform it into a list of lists, where each inner list represents a row from the DataFrame.

Method 1: Using values.tolist()

This method involves using the values attribute to get a NumPy representation of the DataFrame, followed by the tolist() method, which converts the NumPy array to a list of lists. It’s a straight-forward and efficient method for dataframes with homogeneous data types.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
list_of_lists = df.values.tolist()

print(list_of_lists)

Output:

[[1, 4], [2, 5], [3, 6]]

This code snippet creates a DataFrame from a dictionary and converts it into a list of lists, with each inner list representing a DataFrame row. The values attribute gives a NumPy array, and tolist() method converts that array into a standard Python list.

Method 2: Using to_numpy() and Tolist

The to_numpy() method is a newer addition to Pandas that explicitly converts DataFrame elements to a NumPy array before the standard tolist() method is called. This can be more explicit and clear in intention than using values.

Here’s an example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
list_of_lists = df.to_numpy().tolist()

print(list_of_lists)

Output:

[[1, 4], [2, 5], [3, 6]]

In this example, to_numpy() is used to convert the DataFrame into a NumPy array, which is then converted to a list of lists using the tolist() method, yielding the same result as values.tolist().

Method 3: Using apply() with axis=1

Another method involves using a DataFrame’s apply() function with axis=1 to iterate over rows. Inside the apply() method, a lambda function converts each row to a list, thus yielding a Series of lists that can be easily converted to a list of lists.

Here’s an example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
list_of_lists = df.apply(lambda x: x.tolist(), axis=1).tolist()

print(list_of_lists)

Output:

[[1, 4], [2, 5], [3, 6]]

The apply() function with a lambda expression processes each row, and tolist() converts the resulting Series of lists into a list of lists.

Method 4: Using iterrows()

By using the iterrows() method, we can iterate over DataFrame rows directly. Each row is represented by a Pandas Series during iteration, which can be converted to a list, and then appended to the main list.

Here’s an example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
list_of_lists = [row.tolist() for index, row in df.iterrows()]

print(list_of_lists)

Output:

[[1, 4], [2, 5], [3, 6]]

Here, the iterrows() method generates index and row pairs, which we ignore the index and convert each row Series to a list using list comprehension.

Bonus One-Liner Method 5: Using List Comprehension with iloc

A one-liner solution utilizing list comprehension with iloc enables direct access to the DataFrame’s rows. The iloc attribute retrieves rows at the integer location, which can be converted to a list.

Here’s an example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
list_of_lists = [df.iloc[i].tolist() for i in range(len(df))]

print(list_of_lists)

Output:

[[1, 4], [2, 5], [3, 6]]

This example utilizes list comprehension to iterate through index positions and accesses each row with iloc, immediately converting each row to a list.

Summary/Discussion

  • Method 1: Using values.tolist(). Simple and fast. Best for homogeneous data types. May not be the most explicit method for Pandas newcomers.
  • Method 2: Using to_numpy().tolist(). Explicit and clear, especially for those familiar with NumPy. Essentially identical to Method 1 in output.
  • Method 3: Using apply(). Good for complex row-wise transformations. Slightly more verbose and may be overkill for simple conversions.
  • Method 4: Using iterrows(). Direct iteration over rows. Can be slow for large datasets as it returns a Series for each row.
  • Bonus Method 5: Using list comprehension with iloc. Compact one-liner. Offers direct access to dataframe rows but can be less efficient than other methods for large datasets.