5 Best Ways to Convert Complex JSON to CSV in Python

πŸ’‘ Problem Formulation: Converting complex JSON structures into CSV format in Python can be tricky. Developers often face JSON files with nested arrays, objects, or a mix of both, which doesn’t translate directly into the flat structure of a CSV file. This article explores solutions to transform a JSON with hierarchical data, like {"name": "John", "details": {"age": 30, "city": "New York"}}, into a CSV file where each line represents a distinct data record with appropriate headers: name,age,city\nJohn,30,New York.

Method 1: Using pandas json_normalize

Pandas is a powerful Python Data Analysis Library that simplifies data operations. Its json_normalize function is built specifically to flatten semi-structured JSON data into a flat table. You can pass complex JSON objects and specify the record path to extract nested data, and it will create a DataFrame that can then be easily written to a CSV file.

Here’s an example:

import pandas as pd
from pandas.io.json import json_normalize

data = {
    "name": "John",
    "details": {
        "age": 30,
        "city": "New York"
    }
}
df = json_normalize(data)
df.to_csv('output.csv', index=False)

Output:

name,details.age,details.city
John,30,New York

This code snippet constructs a pandas DataFrame from the JSON data and then writes the DataFrame to a CSV file, creating a header row based on the keys from the JSON and including the nested data.

Method 2: Python’s csv Module with Custom Parsing

Python’s built-in csv module can be used to write CSV files. When dealing with complex JSON structures, custom parsing functions can be written to flatten the JSON data into rows of values that are then written to the CSV. This method provides customizability at the expense of writing more code.

Here’s an example:

import json
import csv

def parse_json(data):
    return [data['name'], data['details']['age'], data['details']['city']]

with open('data.json') as json_file, open('output.csv', 'w', newline='') as csv_file:
    data = json.load(json_file)
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(['name', 'age', 'city']) # headers
    csv_writer.writerow(parse_json(data))

Ouput:

name,age,city
John,30,New York

This code snippet opens a JSON file, parses its content, flattens the data structure, and writes the flattened data to a CSV file using Python’s csv module.

Method 3: Using pandas with Custom Column Mapping

Pandas allows for even more complex conversions with custom column mappings. You can assign to each column a lambda function to extract and transform the data from the JSON into the format you desire. This can be useful when JSON structures are inconsistent, or you need to apply some pre-processing.

Here’s an example:

import pandas as pd

data = {
    "name": "John",
    "details": {
        "age": 30,
        "city": "New York"
    }
}
df = pd.json_normalize(data)
df['age'] = df['details.age']
df['city'] = df['details.city']
df.drop(columns=['details.age', 'details.city'], inplace=True)
df.to_csv('output.csv', index=False)

Output:

name,age,city
John,30,New York

In this code, we used pandas.json_normalize for flattening the JSON and then manually mapped and renamed the columns. We write the final DataFrame to a CSV, ensuring that only the columns we want are included.

Method 4: Using Python’s json Module and DictWriter

You can also use Python’s json module in combination with csv.DictWriter for an easy way to convert JSON objects. Unlike csv.writer, DictWriter allows you to write dictionaries to a CSV directly, and it’ll automatically handle the CSV headers.

Here’s an example:

import json
import csv

with open('data.json') as json_file, open('output.csv', 'w', newline='') as csv_file:
    data = json.load(json_file)
    headers = ['name', 'age', 'city']
    csv_writer = csv.DictWriter(csv_file, fieldnames=headers)
    csv_writer.writeheader()
    row = {'name': data['name'], 'age': data['details']['age'], 'city': data['details']['city']}
    csv_writer.writerow(row)

Output:

name,age,city
John,30,New York

This snippet makes use of json.load to read JSON data and csv.DictWriter to map dictionary keys to CSV columns, making it a simple and clean solution for exporting JSON to CSV.

Bonus One-Liner Method 5: Using csv.writer with List Comprehension

For the smallest and straightforward JSON objects, it’s possible to write a one-liner using csv.writer and a list comprehension. This doesn’t support nested JSON structures but is a quick way for flat JSON data.

Here’s an example:

import json
import csv

with open('simple_data.json') as json_file, open('output.csv', 'w', newline='') as csv_file:
    data = json.load(json_file)
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(data.keys())
    csv_writer.writerow([data[key] for key in data])

Output:

name,age,city
John,30,New York

This code uses a one-liner list comprehension to extract values from a flat JSON object and writes them, along with the keys as headers, to a CSV file with Python’s csv module.

Summary/Discussion

  • Method 1: Using pandas json_normalize. Strengths: Easy to use and powerful for nested data. Weaknesses: Requires external library and may be overkill for simple JSON structures.
  • Method 2: Python’s csv Module with Custom Parsing. Strengths: Full control over data parsing with no external dependencies. Weaknesses: Requires more code and is less flexible for variances in data structure.
  • Method 3: Using pandas with Custom Column Mapping. Strengths: Provides great flexibility with data transformation prior to CSV output. Weaknesses: More complex code compared to json_normalize and can be inefficient for large datasets.
  • Method 4: Using Python’s json Module and DictWriter. Strengths: Direct mapping from JSON to CSV with minimal code. Weaknesses: Doesn’t handle deeply nested structures.
  • Method 5: Using csv.writer with List Comprehension. Strengths: Extremely concise for simple JSON data. Weaknesses: Not suitable for nested JSON structures and lacks flexibility.