5 Efficient Ways to Create a pandas DataFrame from JSON

πŸ’‘ Problem Formulation: In data analysis, it’s common practice to convert JSON data into pandas DataFrames. The conversion process can be crucial for preparing data for analysis, especially when dealing with JSON which is widely used in web APIs and config files. For instance, converting a JSON array of user records into a DataFrame for analysis. The goal is to demonstrate how to perform this conversion efficiently using pandas.

Method 1: Using pd.read_json()

For loading JSON data directly from a file or a JSON string, pandas offer the pd.read_json() function. It parses the JSON input into a DataFrame, recognizes multiple orientations, and interprets nested JSON as objects within the DataFrame.

Here’s an example:

import pandas as pd

json_string = '{"name": "John", "age": 30, "city": "New York"}'
df = pd.read_json(json_string, orient='index')

print(df)

Output:

             0
name      John
age         30
city  New York

This code demonstrates the creation of a DataFrame by parsing a simple JSON string. The orient='index' parameter tells pandas to use dictionary keys as row labels.

Method 2: Using json_normalize()

To convert a nested JSON object into a flat table, pandas provides the json_normalize() function. It is particularly useful for JSON objects with nested arrays or dictionaries.

Here’s an example:

from pandas import json_normalize

nested_json = {
    "name": "Jane",
    "location": {
        "city": "Los Angeles",
        "state": "CA"
    }
}
df_normalized = json_normalize(nested_json)

print(df_normalized)

Output:

   name location.city location.state
0  Jane   Los Angeles             CA

This snippet uses json_normalize() to convert nested JSON into a DataFrame, creating a flat structure where the keys become column names.

Method 3: From a List of JSON objects

Casting a list of JSON objects directly into a DataFrame is straightforward if each list item corresponds to a row in the DataFrame.

Here’s an example:

import pandas as pd

json_list = [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30},
    {'name': 'Charlie', 'age': 35}
]
df_from_list = pd.DataFrame(json_list)

print(df_from_list)

Output:

      name  age
0    Alice   25
1      Bob   30
2  Charlie   35

This code constructs a DataFrame from a list of dictionaries, with each dictionary representing a JSON object and each key-value pair defining a column and a data point respectively.

Method 4: Using pd.io.json.loads() and pd.DataFrame()

If your JSON data is received as a string, you can first parse it using pd.io.json.loads() and then create a DataFrame with pd.DataFrame().

Here’s an example:

import pandas as pd
from pandas.io.json import json_normalize

json_str = '[{"name": "Dave", "age": 40}, {"name": "Eve", "age": 45}]'
json_data = json.loads(json_str)
df_from_json_str = pd.DataFrame(json_data)

print(df_from_json_str)

Output:

   name  age
0  Dave   40
1   Eve   45

This method involves converting the JSON string into a list of dictionary objects, which pd.DataFrame() then readily converts into a DataFrame.

Bonus One-Liner Method 5: Using List Comprehension with pd.DataFrame()

If you are dealing with a relatively simple JSON structure and prefer a one-liner, you can use list comprehension combined with pd.DataFrame() to build your DataFrame.

Here’s an example:

import pandas as pd

df_one_liner = pd.DataFrame([{"name": "Frank", "age": 50}, {"name": "Grace", "age": 45}])

print(df_one_liner)

Output:

    name  age
0  Frank   50
1  Grace   45

This compact approach directly passes the JSON array into pd.DataFrame(), immediately creating the DataFrame without additional steps.

Summary/Discussion

  • Method 1: Using pd.read_json(). Best for JSON strings or file inputs. Not ideal for highly nested JSON.
  • Method 2: Using json_normalize(). Ideal for nested JSON structures. May require additional parameters for complex data.
  • Method 3: From a list of JSON objects. Streamlined for lists of dictionaries. Limited to non-nested JSON.
  • Method 4: Using pd.io.json.loads() and pd.DataFrame(). Versatile for JSON strings. Involves two-step conversion.
  • Bonus Method 5: Using list comprehension with pd.DataFrame(). Quick one-liner for simple JSON arrays. May not be suitable for complex data transformations.