Converting a Python List of Dicts to a Set: 5 Effective Methods

πŸ’‘ Problem Formulation:

When working with data in Python, a common task is to convert a list of dictionaries into a set to eliminate duplicates or for set operations. Let’s say you have a list of dictionaries [{"key1": "value1"}, {"key2": "value2"}, {"key1": "value1"}] and you want to turn it into a set eliminating duplicates, expecting [{"key1": "value1"}, {"key2": "value2"}] as the output.

Method 1: Using Set Comprehension With Tuples

This method involves converting the dictionaries to tuples which are hashable, and hence can be added to a set, and then creating a set comprehension to produce a set of unique tuples. Sets cannot include dicts directly since they are mutable and not hashable, but they can include tuples.

Here’s an example:

list_of_dicts = [{"color": "blue"}, {"color": "red"}, {"color": "blue"}]
set_of_tuples = {tuple(d.items()) for d in list_of_dicts}
print(set_of_tuples)

Output:

{(('color', 'red'),), (('color', 'blue'),)}

The code takes each dictionary in the list, turns it into a tuple of items (each item being a key-value pair), and creates a set out of these tuples. As sets automatically remove duplicates, only unique tuples remain.

Method 2: Using a For Loop and Frozensets

Frozensets are immutable sets, and as such can be included in a set. This method iterates through the list of dictionaries, converts each to a frozenset of items, and adds it to a set using a for loop. It ensures unique elements since sets do not allow duplicates.

Here’s an example:

list_of_dicts = [{"name": "Alice"}, {"name": "Bob"}, {"name": "Alice"}]
unique_set = set()

for d in list_of_dicts:
    unique_set.add(frozenset(d.items()))

print(unique_set)

Output:

{frozenset({('name', 'Alice')}), frozenset({('name', 'Bob')})}

This loop iterates over the list, makes a frozenset of the dictionary items, and adds them to the set. At the end of the loop, duplicate dictionaries have been filtered out.

Method 3: Using Dictionary Keys as a Set

If the dictionaries share the same key and only differ by their value, a simple method to create a set is to extract the values into a set. This method assumes all dictionaries have a single key-value pair.

Here’s an example:

list_of_dicts = [{"id": 1}, {"id": 2}, {"id": 1}]
set_of_values = {d["id"] for d in list_of_dicts}

print(set_of_values)

Output:

{1, 2}

The set comprehension here extracts the value associated with the key ‘id’ from each dictionary and produces a set of these values, removing any duplicates.

Method 4: Using JSON

When dictionaries contain unhashable types like lists, JSON strings can be a workaround. This method converts each dictionary to a JSON string, collects them into a set, and converts back to a dictionary if needed.

Here’s an example:

import json

list_of_dicts = [{"data": [1, 2, 3]}, {"data": [4, 5, 6]}, {"data": [1, 2, 3]}]
set_of_json = {json.dumps(d, sort_keys=True) for d in list_of_dicts}

print(set_of_json)

Output:

{"{"data": [1, 2, 3]}", "{"data": [4, 5, 6]}"}

By converting dictionaries to sorted JSON strings, duplicates are eliminated because identical dictionaries will yield identical JSON strings. We can then turn the JSON strings back into dictionaries if required.

Bonus One-Liner Method 5: Using functools and reduce

The reduce() function from Python’s functools can be used to apply a function of two arguments cumulatively to the items of a sequence. This method is not as straightforward but is a more ‘functional programming’ approach to achieve the result.

Here’s an example:

from functools import reduce

list_of_dicts = [{"id": 3}, {"id": 1}, {"id": 3}, {"id": 2}]
set_of_dicts = reduce(lambda s, d: s.union({frozenset(d.items())}), list_of_dicts, set())

print(set_of_dicts)

Output:

{frozenset({('id', 1)}), frozenset({('id', 2)}), frozenset({('id', 3)})}

The reduce function starts with an empty set and unions it with sets created from each dictionary, effectively removing duplicates and creating a set of immutable frozensets.

Summary/Discussion

  • Method 1: Set Comprehension with Tuples. Efficient for small to medium-sized data. Loss of original dict structure since it turns them into tuples.
  • Method 2: For Loop with Frozensets. More explicit than set comprehensions. Slightly more verbose. Retains original structure by storing frozensets.
  • Method 3: Dictionary Keys as a Set. Most effective when all dicts have the same key. Cannot be used for dicts with multiple keys or different keys.
  • Method 4: Using JSON Strings. Useful for complex, nested dictionaries. Involves extra processing to convert back to dicts. Can be costly performance-wise with large data.
  • Method 5: Using functools and reduce. Elegant functional programming technique. May be less readable for those not familiar with the approach.