5 Best Ways to Read JSON Data from a File and Convert It to DataFrame CSV Files in Python

Rate this post

πŸ’‘ Problem Formulation: In this article, we tackle the specific challenge of taking a JSON formatted file and transforming its contents into a CSV format using Python. For example, we may have a file, data.json, containing a set of records represented in JSON. Our task is to read this file, interpret the JSON data, and then output a data.csv file containing all the records in a structured, comma-separated format.

Method 1: Using Pandas

Pandas is a powerful data manipulation library in Python, perfect for data analysis and cleaning. Method 1 involves reading a JSON file using the pandas.read_json() function and converting the resulting DataFrame to a CSV file with the DataFrame.to_csv() method. This approach is straightforward and well-suited for nested JSON structures.

Here’s an example:

import pandas as pd

# Read JSON data
df = pd.read_json('data.json')

# Save to CSV
df.to_csv('data.csv', index=False)

Output: A file named data.csv saved in the current directory.

This code snippet reads the JSON data using the read_json() function and stores it in a DataFrame object. Then, it writes the DataFrame to a CSV file, excluding the index column, with the to_csv() function. This method quickly handles the conversion task with just a couple lines of code.

Method 2: Using Python’s Built-in JSON and CSV Libraries

For those who prefer not to use external libraries, Python’s built-in json and csv modules can also be used. This method entails reading the JSON file with the json.load() function and then writing the parsed JSON data to a CSV file using csv.writer(). It’s ideal for simple, flat JSON files.

Here’s an example:

import json
import csv

# Open the JSON file and load data
with open('data.json', 'r') as json_file:
    json_data = json.load(json_json_file)

# Write to CSV
with open('data.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(json_data[0].keys())  # header
    for item in json_data:
        writer.writerow(item.values())

Output: A file named data.csv with headers and values.

The json.load() function reads the JSON file and the csv.writer() writes the keys as headers, followed by each row’s values. The data is directly written to the CSV without additional data structure intermediaries. Note that this method assumes the JSON structure is a list of dictionaries, each representing a row of data.

Method 3: Using Json_normalize for Nested JSON

When dealing with nested JSON data, the pandas.json_normalize() function can flatten the JSON object into a table. You can then utilize the dataframe’s to_csv() method to convert the table into a CSV file. This method addresses complexities with nested JSON structures.

Here’s an example:

import pandas as pd
import json

# Load nested JSON data
with open('nested_data.json') as json_file:
    nested_json_data = json.load(json_file)

# Normalize and convert to CSV
df = pd.json_normalize(nested_json_data)
df.to_csv('nested_data.csv', index=False)

Output: A file named nested_data.csv representing the flat data from the nested JSON.

This snippet opens a nested JSON file, flattens it using json_normalize(), and exports the resulting DataFrame to CSV format. json_normalize() effectively handles nested dictionaries or lists within the JSON structure by creating columns that represent each level of hierarchy in the JSON.

Method 4: Using the json2csv Python Package

For a method dedicated solely to this conversion process, you can utilize the json2csv package. The primary advantage is that it can handle arrays and deeply nested JSON objects more robustly than the standard CSV library. This package can be installed via pip.

Here’s an example:

from json2csv import Json2Csv

# Convert and save JSON to CSV
converter = Json2Csv(input_json_path='data.json', output_csv_path='data.csv')
converter.convert()

Output: The `data.csv` file is created with the converted data.

In this code, we first create a Json2Csv instance by providing paths for the input and output files. Calling the convert() method processes the JSON file and saves the converted data to the specified CSV file. This specialized tool eases the conversion process for more complex JSON data.

Bonus One-Liner Method 5: Using pandas with One Line of Code

For those seeking the absolute minimum code approach, this bonus one-liner uses Pandas to read and immediately write the DataFrame to a CSV file. This is the quickest method if you’re dealing with uncomplicated data structures.

Here’s an example:

import pandas as pd; pd.read_json('data.json').to_csv('data.csv', index=False)

Output: A compact, one-liner version of the `data.csv` file.

This concise line of code combines reading and writing to CSV in a single line, offering a no-frills approach to simple JSON conversion tasks.

Summary/Discussion

  • Method 1: Pandas. Simple and effective for both flat and nested JSON files. Requires the Pandas library installation. Might be overkill for very simple conversion tasks.
  • Method 2: JSON and CSV Libraries. Good for simple, flat files. No external dependencies required. Not as convenient for nested or complex JSON structures.
  • Method 3: Json_normalize. Excellent for nested and complex JSON. Relies on Pandas functionality and requires additional syntax to handle the normalizing process.
  • Method 4: json2csv Package. Specialized tool capable of handling complex JSON data. Requires external package installation. May offer more robust options for complex data.
  • Bonus Method 5: Pandas One-Liner. Quickest and most straightforward method for simple JSON files. Not suitable for nested or complex JSON data.