5 Best Ways to Convert Python Nested Dicts into DataFrames

πŸ’‘ Problem Formulation: Converting nested dictionaries to Pandas DataFrames can be a challenging but common task in data manipulation and analysis in Python. Users often need to transform data stored in a multi-level nested dictionary, which may be returned from an API or generated within the code, into a tabular DataFrame structure for easier analysis and visualization. The desired output is a Pandas DataFrame with the nested structures effectively flattened into columns and rows.

Method 1: Using pandas.json_normalize()

Pandas offers a convenient function pandas.json_normalize() that can be used to flatten nested dictionaries and turn them into a DataFrame. This function is specifically designed to handle nested JSON data, but it works equally well with Python dictionaries. It automatically unpacks each nested level into separate columns, making the resulting DataFrame easy to work with.

Here’s an example:

import pandas as pd

nested_dict = {
    "ID001": {"name": "John", "info": {"age": 30, "city": "New York"}},
    "ID002": {"name": "Jane", "info": {"age": 25, "city": "Los Angeles"}}
}

df = pd.json_normalize(nested_dict, max_level=1).T.reset_index()
df.columns = ['ID', 'Name', 'Age', 'City']
print(df)

The output will be:

      ID  Name  Age        City
0  ID001  John   30    New York
1  ID002  Jane   25  Los Angeles

The code snippet employs pandas.json_normalize() to flatten the nested dictionary. The .T is used to transpose the result and the .reset_index() method reshapes the DataFrame. Column names are then set manually to reflect the structure of the data.

Method 2: Manual Unpacking of Nested Dictionaries

If you prefer more control over the flattening process or need to perform complex transformations, manual unpacking of nested dictionaries can be a good choice. This involves iterating over the dictionary items and populating a new DataFrame with the rows and columns that you define explicitly.

Here’s an example:

import pandas as pd

nested_dict = {
    "ID001": {"name": "John", "info": {"age": 30, "city": "New York"}},
    "ID002": {"name": "Jane", "info": {"age": 25, "city": "Los Angeles"}}
}

data = []
for key, value in nested_dict.items():
    row = {'ID': key}
    row.update(value)
    row.update(value['info'])
    del row['info']
    data.append(row)

df = pd.DataFrame(data)
print(df)

The output will be:

      ID  name  age        city
0  ID001  John   30    New York
1  ID002  Jane   25  Los Angeles

Here we manually iterate over the nested dictionary, merge the outer and inner dictionaries while removing the ‘info’ key, and then populate a list of row dictionaries. Finally, we create a DataFrame from the list using pd.DataFrame().

Method 3: Using Recursion

Another method for converting nested dictionaries to a DataFrame is using recursion to handle arbitrary levels of nesting. Recursive functions can process each level of the dictionary and return a flattened structure suitable for DataFrame creation.

Here’s an example:

import pandas as pd

nested_dict = {
    "ID001": {"name": "John", "info": {"age": 30, "city": {"name": "New York", "population": 8419000}}},
    "ID002": {"name": "Jane", "info": {"age": 25, "city": {"name": "Los Angeles", "population": 3990000}}}
}

def flatten_dict(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

flattened_data = [flatten_dict(val, key) for key, val in nested_dict.items()]
df = pd.DataFrame(flattened_data)
print(df)

The output will be:

      ID  name  info_age  info_city_name  info_city_population
0  ID001  John       30         New York                8419000
1  ID002  Jane       25       Los Angeles                3990000

This code snippet defines a recursive function flatten_dict() that traverses the nested dictionary and collects key-value pairs into a flat dictionary. These flat dictionaries are then used to construct the DataFrame.

Method 4: Using pandas.DataFrame.from_dict() with Custom Function

You can also use pandas.DataFrame.from_dict() in combination with a custom function for unpacking. This method leverages the flexibility of a custom function to shape the data and pandas.DataFrame.from_dict() to create the DataFrame directly from the dictionary.

Here’s an example:

import pandas as pd

nested_dict = {
    "ID001": {"name": "John", "info": {"age": 30, "city": "New York"}},
    "ID002": {"name": "Jane", "info": {"age": 25, "city": "Los Angeles"}}
}

def unpack_dict(d):
    result = {}
    for k, v in d.items():
        if isinstance(v, dict):
            result.update({f"{k}_{subkey}": subvalue for subkey, subvalue in v.items()})
        else:
            result[k] = v
    return result

df = pd.DataFrame.from_dict({k: unpack_dict(v) for k, v in nested_dict.items()}, orient='index').reset_index()
df.rename(columns={'index': 'ID'}, inplace=True)
print(df)

The output will be:

      ID  name  info_age   info_city
0  ID001  John       30    New York
1  ID002  Jane       25  Los Angeles

This code utilizes a custom unpacking function unpack_dict() that specifically addresses the structure of the nested dictionary. pandas.DataFrame.from_dict() is then used to create the DataFrame, with orient='index' followed by a reset of the index to include the ID as a column.

Bonus One-Liner Method 5: Comprehensions and pandas.DataFrame()

For those looking for a quick and concise way to achieve this conversion, using a one-liner list comprehension to transform the nested dictionary into a list of flat dictionaries, followed by DataFrame creation, is a great solution.

Here’s an example:

import pandas as pd

nested_dict = {
    "ID001": {"name": "John", "info": {"age": 30, "city": "New York"}},
    "ID002": {"name": "Jane", "info": {"age": 25, "city": "Los Angeles"}}
}

df = pd.DataFrame([{'ID': k, **v, **v['info']} for k, v in nested_dict.items()])
del df['info']
print(df)

The output will be:

      ID  name  age        city
0  ID001  John   30    New York
1  ID002  Jane   25  Los Angeles

This compact code snippet flattens the nested dictionary using a list comprehension that merges the keys and values inline. The ** operator is used to unpack the dictionary entries. The 'info' column is then deleted, leaving a clean DataFrame.

Summary/Discussion

  • Method 1: pandas.json_normalize(). Ideal for handling standard JSON-like nested dictionaries. It’s efficient but may need tweaking for complex structures.
  • Method 2: Manual Unpacking. Provides full control over the transformation process. It is flexible but requires more code and can be error-prone if not carefully implemented.
  • Method 3: Recursion. Suitable for deeply nested dictionaries. It’s a powerful technique but might be hard to understand for complex structures and large datasets.
  • Method 4: pandas.DataFrame.from_dict() with Custom Function. Combines a custom approach with a Pandas constructor. Offers customization but is less concise than other methods.
  • Bonus One-Liner Method 5: Comprehensions with pandas.DataFrame(). Provides a quick, elegant solution. It’s concise but might not handle all cases of nesting equivalently.