In data processing, it’s often required to convert a pandas DataFrame into individual rows to either iterate through data or to ease the process of exporting records to various formats. Transforming DataFrame structures into a list of rows can differ in method, depending on the desired output. In this article, we will explore the conversion of a DataFrame with the input structure of a table of entries into rows, exemplified as lists or dictionaries, which can be iterated or serialized with ease.
Method 1: Iterating with .iterrows()
The .iterrows() function in pandas iterates through DataFrame rows as (index, Series) pairs, allowing users to easily convert rows into a list of tuples. This method is very straightforward and often used for row-wise operations.
Here’s an example:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30]
})
for index, row in df.iterrows():
print((index, row.to_dict()))Output:
(0, {'name': 'Alice', 'age': 25})
(1, {'name': 'Bob', 'age': 30})This code snippet demonstrates the use of .iterrows() to iterate over each row in the DataFrame, returning the index along with a Series object, which we convert to a dictionary. It’s convenient for small DataFrames but can be inefficient on larger datasets.
Method 2: Using .itertuples()
The .itertuples() method is a more efficient way to iterate through rows in a DataFrame and can convert rows into namedtuples or regular tuples. It is faster than .iterrows() because it does not convert each row to a Series.
Here’s an example:
for row in df.itertuples(index=False):
print(row)Output:
Pandas(Index=0, name='Alice', age=25) Pandas(Index=1, name='Bob', age=30)
This snippet uses .itertuples() which by default includes the DataFrame index as the first element of the tuple. By setting index=False, we exclude the index from the tuple, directly accessing the row values as attributes of the resulting namedtuple.
Method 3: Convert to a List of Dictionaries with .to_dict('records')
To convert each DataFrame row into a dictionary, the .to_dict('records') method can be used. This is handy when you want to serialize your DataFrame rows, for example to JSON.
Here’s an example:
rows_list = df.to_dict('records')
print(rows_list)Output:
[{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]By calling df.to_dict('records'), we convert the DataFrame into a list of dictionaries, where each dictionary represents a row in the DataFrame, mapping column names to their respective values.
Method 4: Exporting to a JSON String with .to_json(orient='records')
When the goal is to convert the DataFrame into a JSON formatted string, the .to_json(orient='records') method is ideal. It serializes the DataFrame rows as JSON objects, which can be particularly useful for web APIs.
Here’s an example:
json_data = df.to_json(orient='records') print(json_data)
Output:
[{"name":"Alice","age":25},{"name":"Bob","age":30}]In this code snippet, the DataFrame is converted into a JSON string with the help of .to_json(orient='records'). Each entry corresponds to a row exported as a JSON object.
Bonus One-Liner Method 5: Create List of Tuples with list(zip(...))
For a quick and dirty conversion of DataFrame rows into a list of tuples, using list(zip(*df.values.T)) is a nifty one-liner trick.
Here’s an example:
tuples_list = list(zip(*df.values.T)) print(tuples_list)
Output:
[('Alice', 25), ('Bob', 30)]This code creates a list of tuples by zipping the transposed values of the DataFrame, where each tuple represents a row.
Summary/Discussion
- Method 1:
.iterrows(). Simple to use. Creates a row as Series. Not efficient for large DataFrames. - Method 2:
.itertuples(). More efficient than.iterrows(). Returns namedtuples with row data. - Method 3:
.to_dict('records'). Converts all rows into a list of dictionaries. Ideal for serialization. - Method 4:
.to_json(orient='records'). Directly converts DataFrame rows to a JSON string. Useful for web output. - Method 5:
list(zip(...)). Quick one-liner to get list of tuples from DataFrame rows. Easy but not explicit in column names.
