5 Best Ways to Convert a List of Lists into a Tree-Like Dictionary in Python

💡 Problem Formulation: In Python, efficiently transforming a nested list structure into a tree-like dictionary representation can be crucial for manipulating hierarchical data. Consider the input as a list of lists, e.g. [["a", "b", "c"], ["a", "b", "d"]], where each sublist represents a path within the tree. The desired output is a nested dictionary that represents these paths as a tree, such as {"a": {"b": {"c": {}, "d": {}}}}.

Method 1: Recursive Function

Using a recursive function to build the tree dictionary is a straightforward approach. With this method, you create a function that iteratively inserts each path from the list of paths into the dictionary. This function is called recursively to process subpaths until the entire path is inserted. This technique is effective for clarity and maintainability. It handles deeply nested structures well but may not be the most efficient with extremely large datasets due to function call overhead.

Here’s an example:

def insert_path(tree, path):
    if not path:
        return
    node = path.pop(0)
    if node not in tree:
        tree[node] = {}
    insert_path(tree[node], path)

paths = [["a", "b", "c"], ["a", "b", "d"]]
tree = {}
for path in paths:
    insert_path(tree, path.copy())  # Use copy to avoid modifying the original list

print(tree)

The output:

{"a": {"b": {"c": {}, "d": {}}}}

This code defines a function insert_path() that takes a tree and a path, and inserts the path into the tree by recursively creating nested dictionaries. It is then utilized in a loop that processes each path of our input list.

Method 2: Iterative Approach

The iterative approach avoids recursion by using loops to build the nested dictionary structure. It iterates over each path and for each segment of the path, it ensures there is a corresponding nested dictionary. This approach avoids the potential stack overflow issues with large datasets that might affect a recursive implementation and might be faster for moderate-sized trees due to lower function call overhead.

Here’s an example:

paths = [["a", "b", "c"], ["a", "b", "d"]]
tree = {}

for path in paths:
    current_level = tree
    for part in path:
        if part not in current_level:
            current_level[part] = {}
        current_level = current_level[part]

print(tree)

The output:

{"a": {"b": {"c": {}, "d": {}}}}

The code snippet creates a nested dictionary called tree and builds it up by iterating through each segment of each path. The variable current_level refers to the dictionary at the current depth in the tree.

Method 3: Using a Default Dictionary

This method employs Python’s collections.defaultdict to create a tree-like structure. The defaultdict factory function simplifies the code by removing the need to check for the existence of a dictionary at each node. This method produces neat and easily readable code and can sometimes outperform the plain dict approach when dealing with large lists.

Here’s an example:

from collections import defaultdict
import json  # Just for pretty printing in this example

def tree():
    return defaultdict(tree)

paths = [["a", "b", "c"], ["a", "b", "d"]]
tree_root = tree()

for path in paths:
    current_dict = tree_root
    for node in path:
        current_dict = current_dict[node]

print(json.dumps(tree_root, indent=4))  # Pretty print the nested defaultdict structure

The output:

{
    "a": {
        "b": {
            "c": {},
            "d": {}
        }
    }
}

The snippet uses defaultdict to create a function tree() that always generates a new defaultdict when a missing node is accessed. The rest of the code builds the tree structure iteratively in a similar way to method 2.

Method 4: Using a Trie Implementation

A variant of the tree dictionary is a trie (prefix tree). While not strictly the same, it can also represent a list of lists as a tree structure. A trie focuses on the common prefixes of the paths and efficiently stores them. This structure is particularly useful for autocomplete systems or in cases where the prefixes within the paths are relevant to the application.

Here’s an example:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

    def insert(self, path):
        current_node = self
        for part in path:
            if part not in current_node.children:
                current_node.children[part] = TrieNode()
            current_node = current_node.children[part]
        current_node.is_end_of_word = True

trie = TrieNode()
for path in [["a", "b", "c"], ["a", "b", "d"]]:
    trie.insert(path)

def trie_to_dict(node):
    node_dict = {}
    for child, child_node in node.children.items():
        node_dict[child] = trie_to_dict(child_node)
    return node_dict

print(trie_to_dict(trie))

The output:

{"a": {"b": {"c": {}, "d": {}}}}

This code defines a TrieNode class that represents a node in a trie, and a trie_to_dict() function that converts the trie into a dictionary. Each path is inserted into the trie using the insert() method.

Bonus One-Liner Method 5: Using a Nested Dictionary Comprehension

For those who love Python’s one-liners, this method employs a nested dictionary comprehension to achieve a tree-like structure. It does so by iterating over the paths and merging them into a single dictionary. This method is by far the most compact. However, readability may suffer, as nested comprehensions can be difficult to understand at a glance.

Here’s an example:

paths = [["a", "b", "c"], ["a", "b", "d"]]

tree = {x: {y: {z: {} for z in paths if z[:2] == [x, y]} for y in set(p[1] for p in paths if p[0] == x)} for x in set(p[0] for p in paths)}

print(tree)

The output:

{"a": {"b": {"c": {}, "d": {}}}}

The one-liner uses dictionary comprehensions to build three layers of dictionaries based on the unique values in the paths at each level. It uses set comprehension to get unique first and second elements of the paths for higher levels of the tree.

Summary/Discussion

Method 1: Recursive Function. Strengths: Easy to understand and maintain, especially with deeply nested structures. Weaknesses: Might not perform well with extremely large datasets due to function call overhead.
Method 2: Iterative Approach. Strengths: Can be more efficient than the recursive approach for moderate-sized datasets due to less overhead. Weaknesses: Might become complicated if there is deep nesting of the structures.
Method 3: Using a Default Dictionary. Strengths: Simplifies the code by abstracting the checking for node existence, potentially better performance with large datasets. Weaknesses: The ubiquitous use might create very deep nested structure which can be a bit difficult to debug.
Method 4: Using a Trie Implementation. Strengths: Efficiently deals with common prefixes and thus can save space and time complexity. Weaknesses: Might be overkill for small datasets and the trie structure is different from a regular dict structure.
Bonus Method 5: One-Liner Using Nested Dictionary Comprehension. Strengths: Very concise code. Weaknesses: Poor readability and may not be as performant on large datasets.