5 Best Ways to Make a Python List of Dicts Unique by Key

πŸ’‘ Problem Formulation: You’re working with a list of dictionaries in Python, and you need a way to remove duplicates based on a specific key while preserving the original order. For example, given the input [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 1, 'name': 'Alice'}], the desired output is [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}] when making the list unique by the 'id' key.

Method 1: Loop with a Temporary Dictionary

This method uses a standard for-loop along with a temporary dictionary. The temporary dictionary stores the keys of interest as its keys and the corresponding unique dictionaries as values. This ensures that any duplicate entries are overwritten and only unique ones are kept, preserving the original order.

Here’s an example:

items = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'},
    {'id': 1, 'name': 'Alice'}
]
unique_by_key = {}
for item in items:
    unique_by_key[item['id']] = item
result = list(unique_by_key.values())

Output:

[{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]

This code snippet iterates over each dictionary in the list items, using the 'id' key of each dictionary as a unique identifier. If a duplicate is encountered, the existing dictionary in unique_by_key is replaced. Finally, the values of the temporary dictionary are converted back into a list to achieve the result.

Method 2: List Comprehension with Seen Set

In this method, we combine a list comprehension with a set to track keys seen so far. As we iterate through the list of dictionaries, we only include those entries with keys that haven’t been seen. This is a Pythonic and concise approach but requires keeping track of seen keys in a separate data structure.

Here’s an example:

items = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'},
    {'id': 1, 'name': 'Alice'}
]
seen = set()
result = [seen.add(item['id']) or item for item in items if item['id'] not in seen]

Output:

[{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]

This code uses a list comprehension to iterate over each item in the list items, checking if the key has been seen. If it hasn’t, it adds the key to the seen set and includes the item in the resulting list. The or operator is used to cleverly combine the set addition and list construction in a single expression.

Method 3: Using a Custom Function with filter()

By defining a custom function that preserves state about which keys have been seen, you can use Python’s built-in filter() function to create a unique list. This method is functional in style and leverages closure to track seen keys.

Here’s an example:

items = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'},
    {'id': 1, 'name': 'Alice'}
]

def unique_filter(seen=set()):
    def seen_add(item):
        if item['id'] not in seen:
            seen.add(item['id'])
            return True
        return False
    return seen_add

result = list(filter(unique_filter(), items))

Output:

[{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]

This snippet defines a nested function seen_add() which filters the list of dictionaries. The filter() function applies seen_add() to each item, creating an iterator over unique items only, which is then converted back into a list.

Method 4: Using itertools.groupby()

The itertools groupby() function can group the elements of a list. When used with a sorting function that sorts by the relevant key, you can pick the first element from each group to ensure uniqueness.

Here’s an example:

from itertools import groupby

items = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'},
    {'id': 1, 'name': 'Alice'}
]

# Sorting by 'id' key is essential for groupby to work correctly
sorted_items = sorted(items, key=lambda x: x['id'])
result = [next(group) for key, group in groupby(sorted_items, key=lambda x: x['id'])]

Output:

[{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]

This code chunk first sorts the list items by the key ‘id’. It then uses the groupby() function from the itertools module to group items by their ‘id’. Within the list comprehension, next(group) gets the first item of each group, ensuring each ‘id’ appears only once.

Bonus One-Liner Method 5: Using dict.fromkeys()

To create a list of dictionaries with unique values for a given key, the dict.fromkeys() method can be used in combination with a list comprehension. This one-liner approach is compact but perhaps not as readable.

Here’s an example:

items = [
    {'id': 1, 'name': 'Alice'},
    {'id': 2, 'name': 'Bob'},
    {'id': 1, 'name': 'Alice'}
]
result = list(dict.fromkeys(item['id']:item for item in items).values())

Output:

[{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]

The code utilizes dict.fromkeys() to construct a new dictionary where the keys are extracted from the ‘id’ values of each item, and the last instance of each key becomes the value in the new dictionary. By then taking the values of this dictionary, we achieve a list of unique dictionaries by ‘id’.

Summary/Discussion

  • Method 1: Loop with a Temporary Dictionary. Strengths: Simple and preserves order. Weaknesses: Not as concise as other methods.
  • Method 2: List Comprehension with Seen Set. Strengths: Pythonic and concise. Weaknesses: Slightly less readable due to the unusual use of or.
  • Method 3: Using a Custom Function with filter(). Strengths: Functional and modular. Weaknesses: Might be overkill for simple cases.
  • Method 4: Using itertools.groupby(). Strengths: Very powerful for more complex grouping. Weaknesses: Requires pre-sorting of the list which can be inefficient.
  • Bonus Method 5: One-Liner with dict.fromkeys(). Strengths: Extremely compact. Weaknesses: Readability can suffer, and it’s not obvious at first glance.