5 Best Ways to Replace Duplicates in a Python List

💡 Problem Formulation: In Python, managing lists without duplicate values is a common requirement for developers. This article delves into the challenge of replacing duplicate elements in a Python list with a new value, thereby ensuring each item in the list is unique. Imagine we have an input list like [1, 2, 3, 2, 3, 4] and we want to replace duplicates with the value None, resulting in [1, 2, 3, None, None, 4].

Method 1: Using a Set to Track Duplicates

This method involves creating a set to keep track of seen items, then iterating through the original list. If an item is found in the set, it’s replaced; otherwise, it’s added to the set of seen items. This is beneficial due to the set’s O(1) average complexity for lookups and insertions, making the method efficient.

Here’s an example:

def replace_duplicates_with_none(lst):
    seen = set()
    for index, item in enumerate(lst):
        if item in seen:
            lst[index] = None
        else:
            seen.add(item)
    return lst

original_list = [1, 2, 3, 2, 3, 4]
print(replace_duplicates_with_none(original_list))

Output:

[1, 2, 3, None, None, 4]

Here we define a function replace_duplicates_with_none() that takes a list as an argument. It iterates through the list, using a set to remember items it has already seen. If it comes across an item that it has seen before, it replaces the item in the list with None. Otherwise, it adds the item to the set of seen items.

Method 2: List Comprehension with Enumeration

With this method, we use list comprehension combined with enumeration. This approach is more Pythonic and uses a single line to create a new list with None in places of duplicates. It is highly readable but constructs a new list rather than modifying the original one.

Here’s an example:

original_list = [1, 2, 3, 2, 3, 4]
new_list = [item if original_list.index(item) == idx else None for idx, item in enumerate(original_list)]
print(new_list)

Output:

[1, 2, 3, None, None, 4]

The list comprehension iterates through the original list with the help of enumerate() to get both index and item. For each element, it checks if the index in the list equals the index returned by original_list.index(item), which returns the first occurrence of the item. If the indices match, it keeps the item; otherwise, it replaces it with None.

Method 3: Using a Dictionary

Another efficient method uses a dictionary to track occurrences. As dictionaries retain insertion order (from Python 3.7+), they can be used to filter out duplicates while remembering the original order. This method modifies the original list in-place.

Here’s an example:

original_list = [1, 2, 3, 2, 3, 4]
seen = {}
original_list[:] = [seen.setdefault(x, len(seen)) for x in original_list if x not in seen or seen.update({x: None})]
print(original_list)

Output:

[1, 2, 3, None, None, 4]

The code snippet uses a dictionary to set default values for the elements in the list. The setdefault method sets the dictionary key to its length (the index) if it’s not already set. If an element is seen again, it uses seen.update({x: None}) to replace its value with None. The updated list is reconstructed with None in places of duplicates.

Method 4: Replace In-Place with Index Checking

Here we iterate over the list while keeping track of the indices we’ve seen in a separate list. If the current index isn’t the first occurrence of the value, we replace it. This method maintains the original list without using extra sets or dictionaries.

Here’s an example:

original_list = [1, 2, 3, 2, 3, 4]
seen_indices = []

for idx, item in enumerate(original_list):
    first_idx = original_list.index(item)
    if idx != first_idx and idx not in seen_indices:
        original_list[idx] = None
    seen_indices.append(idx)

print(original_list)

Output:

[1, 2, 3, None, None, 4]

The loop through each item checks for the first occurrence using original_list.index(item). It compares this index with the current index. If they are not the same and the current index hasn’t already been seen, the item is replaced with None. Each index is then added to seen_indices.

Bonus One-Liner Method 5: Using a Generator and Set

This method is a concise one-liner that involves creating a generator with an embedded set for tracking. It elegantly replaces duplicates in a new list but doesn’t modify the original list in place.

Here’s an example:

original_list = [1, 2, 3, 2, 3, 4]
print(list(dict.fromkeys(original_list)))

Output:

[1, 2, 3, 4]

This one-liner uses dict.fromkeys() to create a dictionary with no duplicate keys, effectively removing the duplicates from the list. The original order is preserved because dictionaries maintain insertion order from Python 3.7 onward. Then, it is converted back to a list.

Method 1: Using a Set to Track Duplicates

Method 2: List Comprehension with Enumeration

Method 3: Using a Dictionary

Method 4: Replace In-Place with Index Checking

Bonus One-Liner Method 5: Using a Generator and Set

Summary/Discussion