5 Best Ways to Remove Duplicates from a List of Dictionaries in Python

February 22, 2024 by Emily Rosemary Collins

💡 Problem Formulation: When working with lists of dictionaries in Python, one common task is to remove duplicates – dictionaries with identical key-value pairs. For instance, consider a list of dictionaries where each dictionary represents a unique data record. Duplicates may arise due to data entry errors or during data processing. Our goal is to eliminate these duplicates, leaving only unique dictionaries. If we have input like [{"id": 1}, {"id": 2}, {"id": 1}], the desired output would be [{"id": 1}, {"id": 2}].

Method 1: Iterative Comparison

This method involves iterating over the list and comparing each dictionary to every other dictionary to identify duplicates. It’s straightforward but not the most efficient, especially for large lists. Functionally, we create a new list and only add dictionaries that have not already been added, based on comparison.

Here’s an example:

result = []
for d in list_of_dicts:
    if d not in result:
        result.append(d)

Output: [{"id": 1}, {"id": 2}]

This code snippet initializes an empty list result. It then iterates through list_of_dicts and appends a dictionary to result only if it is not already present, thereby removing duplicates.

Method 2: Using a Set for Uniqueness

To make this process more efficient, we can use a set to store a unique hash for each dictionary. This method assumes that dictionary keys are hashable and improves performance. We calculate a tuple of items from each dictionary as its unique identifier.

Here’s an example:

seen = set()
result = []
for d in list_of_dicts:
    identifier = tuple(sorted(d.items()))
    if identifier not in seen:
        seen.add(identifier)
        result.append(d)

Output: [{"id": 1}, {"id": 2}]

In this code, for each dictionary, a sorted tuple of its items is created for hashability. The tuple is added to a set seen if not already present. The corresponding dictionary is added to the result list if it’s a new unique tuple, thus removing duplicates.

Method 3: List Comprehension with a Filter

This method combines list comprehension with a filter condition – in essence, a more Pythonic way of writing Method 1 or 2 with less code and in a more readable format. It uses the same principles but leverages Python’s syntactic sugar.

Here’s an example:

seen = set()
result = [d for d in list_of_dicts if tuple(sorted(d.items())) not in seen and not seen.add(tuple(sorted(d.items())))]

Output: [{"id": 1}, {"id": 2}]

This snippet uses a list comprehension with a condition that filters out duplicates by using a set for uniqueness. The technique makes use of the fact that set.add() returns None, hence it can be used in a condition.

Method 4: Using JSON Serialization

Method 4 uses JSON serialization to convert dictionaries to strings and makes use of sets for uniqueness. This is helpful when the dictionaries cannot be easily hashed because of the presence of types like lists.

Here’s an example:

import json

seen = set()
result = []
for d in list_of_dicts:
    serialized = json.dumps(d, sort_keys=True)
    if serialized not in seen:
        seen.add(serialized)
        result.append(d)

Output: [{"id": 1}, {"id": 2}]

The snippet serializes each dictionary into a JSON string which is inherently hashable. The unique JSON strings are tracked in a set to ensure duplicates are not added to the result list.

Bonus One-Liner Method 5: Coding with functools and operator

This method, albeit complex, provides a one-liner for removing duplicates. It uses functools.reduce and the operator module to iterate through the list and add only those items that have not been seen before. This is for those who love functional programming tricks.

Here’s an example:

from functools import reduce
import operator

result = reduce(lambda r, d: r if d in r else r + [d], list_of_dicts, [])

Output: [{"id": 1}, {"id": 2}]

This one-liner reduces the list by starting with an empty list and accumulating each dictionary only if it is not already present in the accumulator, r.

Summary/Discussion

Method 1: Iterative Comparison. Simple to understand. Not efficient for large data sets.

Method 2: Using a Set for Uniqueness. Improved efficiency. Requires hashable dictionary keys.

Method 3: List Comprehension with a Filter. Pythonic and concise. Still need hashable keys for dictionaries.

Method 4: Using JSON Serialization. Useful when dictionaries contain unhashable objects. Might introduce performance overhead due to serialization.

Method 5: Coding with functools and operator. A functional programming twist. Compact but could be harder to read for those not familiar with functional programming paradigms.