5 Best Ways to Convert CSV to Nested JSON in Python

πŸ’‘ Problem Formulation: Developers frequently need to convert flat structured CSV files into a hierarchical nested JSON format. For instance, a CSV containing employee data with columns for department and position may need to be grouped by departments, with each department containing its list of employees and positions in a JSON structure. This article provides solutions for transforming a simple CSV file into a complex nested JSON output.

Method 1: Using Pandas with groupby()

Pandas is a powerful data manipulation library in Python. This method involves reading the CSV into a DataFrame, grouping the data by certain keys, and finally, converting the grouped data into nested JSON. The to_json() function in Pandas can be used to output the JSON file.

Here’s an example:

import pandas as pd

# Read the CSV file
df = pd.read_csv('employees.csv')

# Group data by 'department' column and convert to JSON
result_json = (df.groupby('department', as_index=False)
                  .apply(lambda x: x.to_dict(orient='records'))
                  .reset_index()
                  .rename(columns={0: 'employees'})
                  .to_json(orient='records'))

print(result_json)

The code grouping by ‘department’ would yield a JSON array where each object contains a department name and an array of employees:

{"department": "Sales", "employees": [{"name": "John", "position": "Manager"}, ...]}

This snippet reads the CSV into a DataFrame, groups the data by department, and then applies a lambda function to convert each group into a list of dictionaries, producing a nested JSON structure.

Method 2: Using csv and json Standard Libraries

This method does not require any external libraries. Python’s built-in csv module is used to read the CSV file and a combination of Python’s data structures along with the json module for the conversion. This method is good for basic CSV to JSON transformations without complex nesting.

Here’s an example:

import csv
import json

# Read the CSV and add data to a dictionary
data = {}
with open('employees.csv', mode='r') as infile:
    reader = csv.reader(infile)
    next(reader)  # Skip the header row
    for rows in reader:
        department = rows[0]
        if department in data:
            data[department].append({'name': rows[1], 'position': rows[2]})
        else:
            data[department] = [{'name': rows[1], 'position': rows[2]}]

# Convert to JSON
nested_json = json.dumps(data, indent=4)
print(nested_json)

The output here will be similar to Method 1, but with the potential of being less elegant for more complex data structures:

{"Sales": [{"name": "John", "position": "Manager"}, ...]}

The code reads the CSV file line by line, creating a dictionary of lists to build the nested structure, which is then converted to a JSON string using the json.dumps() function.

Method 3: Using Recursive Function for Deep Nesting

For deeply nested structures, a recursive function can be written to handle arbitrary levels of nesting. This solution is adaptable and handles more complex requirements than the previous methods, although it requires more setup.

Here’s an example:

import csv
import json

# Function to convert CSV data to nested JSON
def nest_csv(csv_file_path, nest_keys):
    # Read CSV and initiate dictionary
    nested_dict = {}
    with open(csv_file_path, mode='r') as infile:
        reader = csv.DictReader(infile)
        for row in reader:
            last = nested_dict
            for key in nest_keys:
                last = last.setdefault(row.pop(key), {})
            last.update(row)
    return nested_dict

# Convert to nested JSON
nested_json = json.dumps(nest_csv('employees.csv', ['department', 'position']), indent=4)
print(nested_json)

The output will have a nested structure according to the specified keys:

{"Sales": {"Manager": {"name": "John"}, ...}}

This method uses a function that iteratively applies setdefault() to build nested dictionaries from the CSV rows. The resulting nested dictionary is then converted to a JSON string.

Method 4: Using csv.DictReader and a Custom Encoding Function

A custom approach using csv.DictReader to read the CSV file into ordered dictionaries, then using a custom encoding function to process the data into a nested JSON structure. This is more manual but gives great control over the conversion process.

Here’s an example:

import csv
import json

# Custom encoder function
def encode_nested_json(data, keys):
    nested_data = {}
    for entry in data:
        current_level = nested_data
        for key in keys:
            if key not in current_level:
                current_level[key] = {}
            current_level = current_level[key]
        current_level.update(entry)
    return nested_data

# Read the CSV file
data = []
with open('employees.csv', mode='r') as infile:
    reader = csv.DictReader(infile)
    for row in reader:
        data.append(row)

# Convert and print the nested JSON
nested_json = json.dumps(encode_nested_json(data, ['department']), indent=4)
print(nested_json)

This approach offers a JSON structure with nested departments:

{"Sales": {"name": "John", "position": "Manager", ...}}

The snippet reads the CSV into a list of dictionaries and then uses the custom encoding function to recursively build a nested dictionary that is turned into JSON. This method allows for greater flexibility and handling of complex nesting requirements.

Bonus One-Liner Method 5: Using List Comprehensions and Dictionary Comprehensions

This one-liner approach uses Python’s list comprehensions and dictionary comprehensions for a compact transformation. It’s best suited for simpler tasks where less recursion is needed, and readability is not a primary concern.

Here’s an example:

import csv
import json

with open('employees.csv', mode='r') as infile:
    reader = csv.DictReader(infile)
    nested_json = json.dumps({row['department']: [{"name": r['name'], "position": r['position']} for r in reader] for row in reader}, indent=4)
print(nested_json)

The resulting JSON would group all the employees under their respective departments:

{"Sales": [{"name": "John", "position": "Manager"}, ...]}

This compact code uses dictionary and list comprehensions to construct a nested JSON object directly from the CSV file rows. It’s a quick and elegant solution for simpler nesting but may become unwieldy for more complex data or deeper nesting levels.

Summary/Discussion

  • Method 1: Pandas with GroupBy. Great for data frames. Might be overkill for simple CSV to JSON conversions.
  • Method 2: csv and json Standard Libraries. No external dependencies. Can be less convenient for complex nested structures.
  • Method 3: Recursive Function for Deep Nesting. Highly adaptable for deep and complex nestings. Could be harder to follow for beginners.
  • Method 4: Custom Encoding Function. Offers fine-grained control. More manual setup and potentially more code to manage.
  • Bonus Method 5: List and Dictionary Comprehensions. Quick one-liner. Best for simple architecture and when code brevity is preferred.