5 Best Ways to Remove Duplicates from a List in Python

💡 Problem Formulation: In Python, lists are a common data structure used to store collections of items. However, lists can contain duplicate elements, which may not always be desired. For instance, given the input list [1, 2, 3, 2, 1, 5, 6, 5, 5, 5], the desired output after removing duplicates is [1, 2, 3, 5, 6]. The challenge here is to remove duplicates efficiently and Pythonically.

Method 1: Using a Set

Transforming a list into a set automatically removes any duplicates because sets in Python cannot have duplicate elements by definition. This method is straightforward and fast, especially for lists with a large number of elements. However, it does not maintain the original list’s order.

Here’s an example:

numbers = [1, 2, 3, 2, 1, 5, 6, 5, 5]
unique_numbers = list(set(numbers))
print(unique_numbers)

Output: [1, 2, 3, 5, 6]

This code snippet creates a set from the original list which removes the duplicates, then converts the set back into a list. The order in which the unique numbers appear in the output list may not match their original order.

Method 2: Using Dictionary Keys

Dictionaries in Python cannot have duplicate keys, so when a list is converted to a dictionary via the dict.fromkeys() method, duplicates are removed. Moreover, since Python 3.7, dictionaries preserve insertion order, ensuring that the order of elements is maintained.

Here’s an example:

numbers = [1, 2, 3, 2, 1, 5, 6, 5, 5]
unique_numbers = list(dict.fromkeys(numbers))
print(unique_numbers)

Output: [1, 2, 3, 5, 6]

The code snippet uses dict.fromkeys() to create a new dictionary with the list items as keys, which removes duplicates. This dictionary is then converted back to a list to obtain a list without duplicates, preserving the element’s order.

Method 3: List Comprehension with Membership Test

This method loops over each element in the original list and adds it to a new list only if it has not been added before, effectively filtering out duplicates. It preserves the original list’s order and works well with small to medium-sized lists.

Here’s an example:

numbers = [1, 2, 3, 2, 1, 5, 6, 5, 5]
unique_numbers = []
[unique_numbers.append(n) for n in numbers if n not in unique_numbers]
print(unique_numbers)

Output: [1, 2, 3, 5, 6]

The code snippet uses a list comprehension to iterate through the original list and append each number to the new list only if it is not already present, using the membership test not in to ensure uniqueness of elements.

Method 4: Using Collections.OrderedDict

The OrderedDict from the collections module also ensures that keys are unique and maintains the insertion order. It is especially useful in versions of Python prior to 3.7, where dictionaries did not guarantee order by default.

Here’s an example:

from collections import OrderedDict
numbers = [1, 2, 3, 2, 1, 5, 6, 5, 5]
unique_numbers = list(OrderedDict.fromkeys(numbers))
print(unique_numbers)

Output: [1, 2, 3, 5, 6]

This snippet creates an OrderedDict which, similar to dict.fromkeys(), filters out duplicates and maintains the order of the elements. After that, it converts the OrderedDict back to a list to give a duplicate-free list.

Bonus One-Liner Method 5: The More Itertools Approach

The unique_everseen function from the third-party more_itertools module outputs unique elements, preserving their order. It requires installing the more_itertools package first, which may not come with Python’s standard library.

Here’s an example:

from more_itertools import unique_everseen
numbers = [1, 2, 3, 2, 1, 5, 6, 5, 5]
unique_numbers = list(unique_everseen(numbers))
print(unique_numbers)

Output: [1, 2, 3, 5, 6]

By using the unique_everseen function, the code simply filters the list to remove any duplicates while retaining the original order of the elements. It’s a compact and readable one-liner method for those who don’t mind an external dependency.

Summary/Discussion

Method 1: Using a Set. Quick and easy. Does not maintain order.
Method 2: Using Dictionary Keys. Simple and ensures order is maintained. Not as memory-efficient as sets for large data.
Method 3: List Comprehension with Membership Test. Easy to understand and maintains order. Can be less efficient for large lists due to the membership tests.
Method 4: Using Collections.OrderedDict. Ensures both uniqueness and order, similar to Method 2, but more verbose. Useful for legacy Python versions.
Bonus One-Liner Method 5: The More Itertools Approach. Clean and maintains order. Requires an additional package not in the standard library.