5 Best Ways to Create a List of Tuples From CSV in Python

5/5 - (1 vote)

πŸ’‘ Problem Formulation: You want to read a Comma Separated Values (CSV) file and convert its rows into a list of tuples in Python. A common use case could be processing spreadsheet data to perform operations on each row. For example, given a CSV with columns β€˜Name’ and β€˜Age’, you want to convert the rows from something like:

Name, Age
Alice, 23
Bob, 30

to a list of tuples like:

[('Alice', 23), ('Bob', 30)]

Notice that β€˜Age’ should be converted to an integer.

Method 1: Using csv.reader

The csv.reader approach in Python allows for reading CSV files row by row and converting each row into a tuple. With minor casting as necessary (e.g., converting string representations of numbers to actual numerical types), it provides a customizable way to create lists of tuples from CSV data.

Here’s an example:

import csv

def csv_to_list_of_tuples(filename):
    with open(filename, 'r') as file:
        reader = csv.reader(file)
        next(reader)  # Skip the header
        return [(row[0], int(row[1])) for row in reader]

tuples_list = csv_to_list_of_tuples('people.csv')
print(tuples_list)

Output:

[('Alice', 23), ('Bob', 30)]

This code snippet opens a file named β€˜people.csv’, uses csv.reader to read the file, skips the header row, and then constructs a list of tuples with appropriate type conversion for the ‘Age’ field.

πŸ‘‰ Convert CSV to List of Tuples in Python

Method 2: Using pandas

Pandas is a powerful data manipulation library in Python that provides a convenient function called read_csv() which can be used to load CSV data into a DataFrame. From the DataFrame, one can easily convert the data into a list of tuples.

Here’s an example:

import pandas as pd

df = pd.read_csv('people.csv')
tuples_list = [tuple(x) for x in df.to_numpy()]
print(tuples_list)

Output:

[('Alice', 23), ('Bob', 30)]

What this code snippet does is load the CSV into a pandas DataFrame using read_csv, and then converts the DataFrame to a NumPy array with to_numpy(). Each array element, representing a row, is cast to a tuple to form the final list of tuples.

Method 3: Using csv.DictReader

Using csv.DictReader is an alternative method that reads the CSV file into an OrderedDict per row. This makes it easier to handle CSV data by column names. The list of tuples can be created by iterating through the DictReader and creating tuples from the OrderedDict values, ensuring the values are cast to the appropriate types.

Here’s an example:

import csv

def csv_to_list_of_tuples_using_dictreader(filename):
    with open(filename, 'r') as file:
        dict_reader = csv.DictReader(file)
        return [(row['Name'], int(row['Age'])) for row in dict_reader]

tuples_list = csv_to_list_of_tuples_using_dictreader('people.csv')
print(tuples_list)

Output:

[('Alice', 23), ('Bob', 30)]

This code snippet reads the CSV into a dict-like structure using csv.DictReader and constructs a list of tuples where the values are accessed by keys (column names) and type casting is performed on the ‘Age’ field to convert it into an integer.

Method 4: Using sqlite3 and CSV module

For an approach that handles larger data sets efficiently, you can use the sqlite3 module to create an in-memory SQL database, read the CSV into it, and then query the results back into a list of tuples.

Here’s an example:

import csv
import sqlite3

def csv_to_list_of_tuples_using_sqlite(filename):
    connection = sqlite3.connect(':memory:')
    cursor = connection.cursor()
    cursor.execute('CREATE TABLE people (Name text, Age integer);')

    with open(filename, 'r') as file:
        dr = csv.DictReader(file)
        to_db = [(i['Name'], int(i['Age'])) for i in dr]

    cursor.executemany("INSERT INTO people (Name, Age) VALUES (?, ?);", to_db)

    cursor.execute("SELECT * FROM people;")
    return cursor.fetchall()

tuples_list = csv_to_list_of_tuples_using_sqlite('people.csv')
print(tuples_list)

Output:

[('Alice', 23), ('Bob', 30)]

By creating an SQL table and inserting rows from the CSV file after reading it with csv.DictReader, the tuples list is obtained by fetching all rows from the database. This approach is efficient and flexible for complex querying and large datasets.

πŸ‘‰ 5 Best Ways to Create a List of Tuples From Two Lists

Bonus One-Liner Method 5: Using List Comprehension with the open() Function

For those who prefer a concise one-liner, if the data does not need much processing (like type conversion), Python’s file open() function can be employed in combination with list comprehension for rapidly converting CSV rows to a list of tuples.

Here’s an example:

tuples_list = [tuple(line.strip().split(',')) for line in open('people.csv', 'r').readlines()[1:]]
print(tuples_list)

Output:

[('Alice', ' 23'), ('Bob', ' 30')]

This one-liner reads each line of the file, strips trailing whitespaces, splits by comma, and directly creates a list of tuples from them, skipping the first line which typically contains headers.

πŸ‘‰ How to Convert Tuples to a CSV File in Python [4 Ways]

Summary/Discussion

  • Method 1: Using csv.reader. Strengths: Part of the standard library, straightforward and simple. Weaknesses: Manual type conversion may be necessary.
  • Method 2: Using pandas. Strengths: Easy to handle and powerful for data manipulation. Weaknesses: Requires an external library and might be overkill for simple tasks.
  • Method 3: Using csv.DictReader. Strengths: Clean code that accesses elements by column names. Weaknesses: Similar to csv.reader in performance and also requires manual type conversions.
  • Method 4: Using sqlite3 and CSV module. Strengths: Efficient for large datasets, allows for complex SQL queries. Weaknesses: More code and complexity compared to other methods.
  • Bonus One-Liner Method 5: Strengths: Incredibly compact and requires no external libraries. Weaknesses: No header skipping or type conversion, and not as readable as more verbose methods.

Check out my new Python book Python One-Liners (Amazon Link).

If you like one-liners, you’ll LOVE the book. It’ll teach you everything there is to know about a single line of Python code. But it’s also an introduction to computer science, data science, machine learning, and algorithms. The universe in a single line of Python!

The book was released in 2020 with the world-class programming book publisher NoStarch Press (San Francisco).

Publisher Link: https://nostarch.com/pythononeliners