5 Best Ways to Convert Python CSV to JSON with Nested Structures

💡 Problem Formulation: Converting data from CSV to JSON format is a common task in data processing. However, when dealing with complex data structures, the desired JSON output often requires nested objects. For example, if we have a CSV input where rows represent employees and columns represent personal details, departments, and projects, our goal may be to create a nested JSON where each employee is an object with nested arrays for departments and projects.

Method 1: Using the csv and json Standard Libraries

This method utilizes Python’s built-in csv module to read the CSV file and the json module to convert and output the data in a nested JSON format. It provides greater control over how the data is transformed, especially for manually handling nested structures.

Here’s an example:

import csv
import json

# Initialize an empty list to store the JSON
employees = []

# Open the CSV and read rows as dictionaries
with open('employees.csv', mode='r') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        # Manually build the nested structure
        employee = {
            'name': row['name'],
            'details': {'age': row['age'], 'title': row['title']},
            'departments': row['departments'].split(';'),
            'projects': row['projects'].split(';')
        }
        employees.append(employee)

# Output the JSON
json_output = json.dumps(employees, indent=4)
print(json_output)

The output will be a JSON formatted string with nested arrays and objects based on the CSV content.

This code first initializes an empty list called employees. It then reads each row from the CSV file as a dictionary, constructs the nested JSON structure manually, and appends it to the employees list. Finally, it converts the list to a JSON formatted string and prints it.

Method 2: Pandas with json_normalize

Pandas is a powerful data manipulation library in Python. It can be used to read a CSV file into a DataFrame and then leverage json_normalize to generate nested JSON structure. This method is beneficial for complex data transformations and nested JSON outputs.

Here’s an example:

import pandas as pd
from pandas import json_normalize

# Read the CSV into a DataFrame
df = pd.read_csv('employees.csv')

# Define the record path to create nested lists
record_path = ['departments', 'projects']

# Normalize the DataFrame to a nested JSON
json_output = json_normalize(df.to_dict(orient='records'), record_path=record_path)
print(json_output.to_json(orient='records', indent=4))

The output will be a JSON formatted string with nested structures as specified in the record path.

The code snippet above reads the CSV file into a DataFrame, then normalizes the data using json_normalize by defining the paths for nesting in the record_path. The normalized data is then outputted as a nested JSON string.

Method 3: Combining csv.DictReader with Recursive Function

For more dynamic and deep nesting, we can combine csv.DictReader with a recursive function that can nest dictionaries based on keys. This allows for a flexible and general solution for multiple levels of nesting.

Here’s an example:

import csv
import json

# Recursive function to create nested dictionaries
def nest_dict(flat_dict, keys):
    if keys:
        key = keys[0]
        if key in flat_dict:
            return {flat_dict[key]: nest_dict(flat_dict, keys[1:])}
    return flat_dict

# List of keys defining the nesting order
nesting_keys = ['department', 'team']

with open('employees.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    result = [nest_dict(row, nesting_keys) for row in csv_reader]

print(json.dumps(result, indent=4))

The output will be a deeply nested JSON structured based on the specified keys.

The provided code reads the CSV with csv.DictReader, then uses a recursive function, nest_dict(), to create nested dictionaries for each row according to the nesting keys provided. The result is then formatted as JSON and printed.

Method 4: Using Custom Classes and Objects

When the desired JSON structure is complex and needs to include specific data types (like dates or decimals), custom classes in Python can be used to represent the data structure. These classes can then be serialized into JSON.

Here’s an example:

import csv
import json
from datetime import datetime

class Employee:
    def __init__(self, name, age, title, departments, projects):
        self.name = name
        self.details = {'age': age, 'title': title}
        self.departments = departments.split(';')
        self.projects = projects.split(';')
        
    def toJSON(self):
        return json.dumps(self, default=lambda o: o.__dict__, sort_keys=True, indent=4)

employees = []
with open('employees.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        emp = Employee(row['name'], row['age'], row['title'], row['departments'], row['projects'])
        employees.append(emp)

print([emp.toJSON() for emp in employees])

The output will be a JSON string with nested structures where each employee is represented by a class instance.

Each row from the CSV file is used to instantiate an Employee object, with methods to handle conversion to a nested JSON representation. The toJSON method serializes the Employee object into JSON format.

Bonus One-Liner Method 5: Using List Comprehension with csv and json Modules

A concise and compact way to convert a CSV to a nested JSON is by using a combination of list comprehension, csv, and json modules. This is a quick solution for simple transformations.

Here’s an example:

import csv
import json

with open('employees.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    json_output = json.dumps([{'name': row['name'], 'details': {'age': row['age'], 'title': row['title']}} for row in csv_reader], indent=4)
print(json_output)

The output will be a simply nested JSON string with employee names and a details object.

This one-liner reads the CSV into a list of dictionaries using csv.DictReader and immediately nests each row into a JSON structure using list comprehension, which is then converted to a JSON formatted string.

Summary/Discussion

Method 1: csv and json Standard Libraries. This provides manual control and is great for simple customizations. However, it may get verbose for deep nesting.
Method 2: Pandas with json_normalize. Perfect for handling complex data transformations and deep nesting with minimal code. The downside is the additional dependency on the Pandas library.
Method 3: Recursive Function. Offers flexibility for various levels of deep nesting. The weakness is the potential complexity in understanding and maintaining the recursive code.
Method 4: Custom Classes and Objects. Ideal for complex custom data types but requires more boilerplate code for class definitions and object serialization.
Method 5: One-Liner with List Comprehension. A quick and easy solution for simple nests, but lacks the control for customization and handling more complex structures.