Python: Create List of Tuples from DataFrame

5/5 - (1 vote)

πŸ’‘ Problem Formulation: When working with pandas DataFrames, you might require converting the data into a list of tuples, where each tuple represents a row in the DataFrame.

Suppose we have a DataFrame containing employee information with columns ['Name', 'Age', 'Department'], the desired output is a list of tuples like [('Alice', 30, 'HR'), ('Bob', 22, 'Sales'), ...].

Method 1: Iterrows

Using DataFrame.iterrows() is a straightforward method to iterate over DataFrame rows as (index, Series) pairs. This method allows for converting each row to a tuple by taking the .values attribute of each Series and converting it to a tuple.

Here’s an example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [30, 22],
    'Department': ['HR', 'Sales']
})

# Convert DataFrame to list of tuples
tuples_list = [tuple(row) for index, row in df.iterrows()]

print(tuples_list)

This code snippet creates a DataFrame from a dictionary of lists, representing employees’ data. By iterating over the DataFrame rows, each row is converted to a tuple and collected into a list.

πŸ‘‰ Python List of Tuples to DataFrame 🐼

Method 2: Itertuples

The DataFrame.itertuples() method is an efficient iterator that yields namedtuples of the rows, with the row’s index value as the first element of the tuple. To exclude the index, you can call itertuples(index=False).

Here’s an example:

# Convert DataFrame to list of tuples without the index
tuples_list = list(df.itertuples(index=False, name=None))

print(tuples_list)

This example uses itertuples() with index=False and name=None to convert the DataFrame rows directly into a list of tuples, excluding the index and not using namedtuples, which results in a regular tuple representation.

Method 3: Zip with DataFrame Columns

Using the built-in Python function zip(), you can combine columns of a DataFrame into tuples. This approach is especially useful when you need to select specific columns to form the tuples.

Here’s an example:

# Convert specified DataFrame columns to list of tuples
tuples_list = list(zip(df['Name'], df['Age'], df['Department']))

print(tuples_list)

In this code snippet, zip() is called with the specific columns of the DataFrame as arguments. This combines the values of the rows column-wise into tuples, and the result is converted into a list.

πŸ‘‰ Convert CSV to List of Tuples in Python

Method 4: Apply Along Axis

Pandas’ DataFrame.apply() method can also be used with a lambda function to apply a transformation along a particular axis – in this case, axis=1 to iterate over rows. We can convert rows to tuples using a lambda function that returns a tuple.

Here’s an example:

# Convert DataFrame rows to list of tuples using apply
tuples_list = df.apply(lambda row: tuple(row), axis=1).tolist()

print(tuples_list)

In this snippet, the apply() method takes a lambda function that simply returns a tuple of the row. The axis=1 argument specifies that the function should be applied across columns for each row. The result is then converted to a list.

Bonus One-Liner Method 5: Values and Tuple Conversion

For a minimalist approach, the DataFrame’s .values attribute can be used. This attribute returns a numpy representation of the DataFrame which can be quickly converted to a list of tuples using a map function in a concise one-liner.

Here’s an example:

# One-liner to convert DataFrame to list of tuples
tuples_list = list(map(tuple, df.values))

print(tuples_list)

This single line of code directly maps the tuple function onto the values of the DataFrame, resulting in a list of tuples, each representing a row of data.

πŸ‘‰ How to Convert List of Lists to List of Tuples in Python?

Summary/Discussion

  • Method 1: Iterrows. Strengths: Easy to understand and explicit in its iteration process. Weaknesses: Generally slower compared to other methods, especially on large DataFrames.
  • Method 2: Itertuples. Strengths: More efficient than iterrows() and straightforward usage. Weaknesses: Need to set parameters to avoid including the index or returning namedtuples when unnecessary.
  • Method 3: Zip with DataFrame Columns. Strengths: Offers flexibility in selecting specific columns to create tuples. Weaknesses: Manual specification of columns is required, not suitable for DataFrames with many columns or dynamic column names.
  • Method 4: Apply Along Axis. Strengths: Versatile, can be used for more complex row-wise operations beyond just tuple conversion. Weaknesses: Can be slower for larger datasets and less efficient compared to itertuples().
  • Bonus One-Liner Method 5: Values and Tuple Conversion. Strengths: Extremely concise and efficient, particularly for smaller DataFrames. Weaknesses: May not be as readable or explicit for those new to Python’s functional programming paradigm.

Check out my new Python book Python One-Liners (Amazon Link).

If you like one-liners, you’ll LOVE the book. It’ll teach you everything there is to know about a single line of Python code. But it’s also an introduction to computer science, data science, machine learning, and algorithms. The universe in a single line of Python!

The book was released in 2020 with the world-class programming book publisher NoStarch Press (San Francisco).

Publisher Link: https://nostarch.com/pythononeliners