5 Best Ways to Convert a pandas DataFrame to Rows

πŸ’‘ Problem Formulation:

In data processing, it’s often required to convert a pandas DataFrame into individual rows to either iterate through data or to ease the process of exporting records to various formats. Transforming DataFrame structures into a list of rows can differ in method, depending on the desired output. In this article, we will explore the conversion of a DataFrame with the input structure of a table of entries into rows, exemplified as lists or dictionaries, which can be iterated or serialized with ease.

Method 1: Iterating with .iterrows()

The .iterrows() function in pandas iterates through DataFrame rows as (index, Series) pairs, allowing users to easily convert rows into a list of tuples. This method is very straightforward and often used for row-wise operations.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30]
})

for index, row in df.iterrows():
    print((index, row.to_dict()))

Output:

(0, {'name': 'Alice', 'age': 25})
(1, {'name': 'Bob', 'age': 30})

This code snippet demonstrates the use of .iterrows() to iterate over each row in the DataFrame, returning the index along with a Series object, which we convert to a dictionary. It’s convenient for small DataFrames but can be inefficient on larger datasets.

Method 2: Using .itertuples()

The .itertuples() method is a more efficient way to iterate through rows in a DataFrame and can convert rows into namedtuples or regular tuples. It is faster than .iterrows() because it does not convert each row to a Series.

Here’s an example:

for row in df.itertuples(index=False):
    print(row)

Output:

Pandas(Index=0, name='Alice', age=25)
Pandas(Index=1, name='Bob', age=30)

This snippet uses .itertuples() which by default includes the DataFrame index as the first element of the tuple. By setting index=False, we exclude the index from the tuple, directly accessing the row values as attributes of the resulting namedtuple.

Method 3: Convert to a List of Dictionaries with .to_dict('records')

To convert each DataFrame row into a dictionary, the .to_dict('records') method can be used. This is handy when you want to serialize your DataFrame rows, for example to JSON.

Here’s an example:

rows_list = df.to_dict('records')
print(rows_list)

Output:

[{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]

By calling df.to_dict('records'), we convert the DataFrame into a list of dictionaries, where each dictionary represents a row in the DataFrame, mapping column names to their respective values.

Method 4: Exporting to a JSON String with .to_json(orient='records')

When the goal is to convert the DataFrame into a JSON formatted string, the .to_json(orient='records') method is ideal. It serializes the DataFrame rows as JSON objects, which can be particularly useful for web APIs.

Here’s an example:

json_data = df.to_json(orient='records')
print(json_data)

Output:

[{"name":"Alice","age":25},{"name":"Bob","age":30}]

In this code snippet, the DataFrame is converted into a JSON string with the help of .to_json(orient='records'). Each entry corresponds to a row exported as a JSON object.

Bonus One-Liner Method 5: Create List of Tuples with list(zip(...))

For a quick and dirty conversion of DataFrame rows into a list of tuples, using list(zip(*df.values.T)) is a nifty one-liner trick.

Here’s an example:

tuples_list = list(zip(*df.values.T))
print(tuples_list)

Output:

[('Alice', 25), ('Bob', 30)]

This code creates a list of tuples by zipping the transposed values of the DataFrame, where each tuple represents a row.

Summary/Discussion

  • Method 1: .iterrows(). Simple to use. Creates a row as Series. Not efficient for large DataFrames.
  • Method 2: .itertuples(). More efficient than .iterrows(). Returns namedtuples with row data.
  • Method 3: .to_dict('records'). Converts all rows into a list of dictionaries. Ideal for serialization.
  • Method 4: .to_json(orient='records'). Directly converts DataFrame rows to a JSON string. Useful for web output.
  • Method 5: list(zip(...)). Quick one-liner to get list of tuples from DataFrame rows. Easy but not explicit in column names.