5 Best Ways to Convert a Pandas DataFrame to a Nested List

πŸ’‘ Problem Formulation: When working with data in Python, it’s common to use Pandas DataFrames for data manipulation and analysis. However, situations arise where you need your DataFrame structures in a nested list format. For example, you might be interfacing with APIs that only accept JSON data, or perhaps you are aiming to serialize the data for networking or storage purposes where a nested list format is more appropriate. This article demonstrates how to convert a pandas DataFrame into a nested list, with an example DataFrame as the input and a nested list as the desired output.

Method 1: Using values.tolist()

The values.tolist() method converts the DataFrame into a list of lists where each sublist represents a row from the DataFrame. This approach is straightforward and preserves data order.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

nested_list = df.values.tolist()

Output:

[[1, 3], [2, 4]]

This code snippet creates a simple DataFrame with two columns and two rows. It then uses the values attribute to access the underlying numpy array and the tolist() method to convert that array into a nested list, with each sublist corresponding to a row in the DataFrame.

Method 2: Using a List Comprehension

A list comprehension offers a Pythonic way to convert a DataFrame into a nested list. It’s flexible, allowing for additional logic during conversion, such as selective inclusion of columns or modification of data.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

nested_list = [row.tolist() for index, row in df.iterrows()]

Output:

[[1, 3], [2, 4]]

The snippet demonstrates list comprehension that iterates over the DataFrame rows using iterrows(). For each row, it extracts a list with tolist() and collects these lists into a larger list, resulting in a nested structure.

Method 3: Using apply() with a Lambda Function

The apply() method allows applying a lambda function to each row or column of the DataFrame. This method can be customized to shape the nested list according to specific requirements.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

nested_list = df.apply(lambda x: x.tolist(), axis=1).tolist()

Output:

[[1, 3], [2, 4]]

In this code, apply() is used to apply a lambda function across the DataFrame rows (axis=1). The lambda function converts each row to a list, and then tolist() is called on the resulting Series to obtain the final nested list structure.

Method 4: Using json.loads() and to_json()

Conversion to JSON and back to a list with Python’s built-in json library can also be used. This method is especially useful when the end goal is to produce JSON formatted data.

Here’s an example:

import pandas as pd
import json

df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

json_data = df.to_json(orient='records')
nested_list = json.loads(json_data)

Output:

[{'A': 1, 'B': 3}, {'A': 2, 'B': 4}]

This code snippet serializes the DataFrame to a JSON formatted string with to_json(orient='records'). It then parses the JSON string back into a Python object, which results in a list of dictionaries.

Bonus One-Liner Method 5: Using to_numpy() and tolist()

For an efficient one-liner, you can combine the use of to_numpy() to convert the DataFrame to a NumPy array and then apply tolist() to get a nested list.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

nested_list = df.to_numpy().tolist()

Output:

[[1, 3], [2, 4]]

This concise one-liner accomplishes the task at hand with minimal fuss. We use to_numpy() to get a NumPy array representation of the DataFrame and immediately convert this to a list of lists with tolist().

Summary/Discussion

  • Method 1: values.tolist(). This method is direct and maintains row order. However, it includes all columns by default and doesn’t allow for transformation within the conversion process.
  • Method 2: List Comprehension. It is a flexible and readable approach, suitable for more complex transformations, but can be less efficient for large DataFrames.
  • Method 3: apply() with Lambda. Customizable and elegant, it comes at a potential cost of performance due to inherent row-wise operations in the apply method.
  • Method 4: JSON Conversion. Useful for web-related tasks, it results in a list of dictionaries rather than a list of lists and involves additional overhead from serialization and deserialization.
  • Bonus Method 5: to_numpy().tolist(). This is the fastest and most concise, but like values.tolist(), it lacks flexibility for row-wise customizations.