5 Best Ways to Remove Duplicate Entries in a Python List

💡 Problem Formulation: When working with lists in Python, it is quite common to encounter duplicates which can skew results or affect performance. Suppose you have a list: [3, 5, 2, 3, 8, 5] and you want to eliminate the duplicates to get [3, 5, 2, 8]. This article provides multiple solutions to tackle such scenarios.

Method 1: Using a for Loop

This method involves iterating over the original list and appending unique elements to a new list. It is simple and intuitive but not the most efficient for large lists due to its O(n^2) complexity.

Here’s an example:

def remove_duplicates(input_list):
    result = []
    for item in input_list:
        if item not in result:
            result.append(item)
    return result

print(remove_duplicates([3, 5, 2, 3, 8, 5]))

Output: [3, 5, 2, 8]

This function iterates through the input list and appends each item to the result only if it is not already present in the result, thus ensuring all elements are unique.

Method 2: Using set()

The set() data structure in Python is designed to store unique elements. This method converts the list to a set to remove duplicates and then back to a list. It is efficient and has a lower complexity of O(n), but the original order of elements is not guaranteed.

Here’s an example:

def remove_duplicates(input_list):
    return list(set(input_list))

print(remove_duplicates([3, 5, 2, 3, 8, 5]))

Output: [2, 3, 5, 8] (Note: Order might vary)

Converting the list to a set removes duplicates and converting it back to a list provides a duplicate-free list, but without preserving the original order of elements.

Method 3: Using a Dictionary

Similar to sets, dictionaries in Python cannot have duplicate keys. This method involves using a dictionary to remove duplicates, preserving the order of first occurrences, thereby combining efficiency with ordered results.

Here’s an example:

def remove_duplicates(input_list):
    return list(dict.fromkeys(input_list))

print(remove_duplicates([3, 5, 2, 3, 8, 5]))

Output: [3, 5, 2, 8]

This code uses the dict.fromkeys() function which creates a dictionary with unique elements from the list as keys (preserving their order) and then converts the keys back into a list.

Method 4: Using List Comprehensions

List comprehensions provide a succinct and readable way to create lists. This method removes duplicates by iterating through the list and including only the first occurrence of each element, maintaining the original order.

Here’s an example:

def remove_duplicates(input_list):
    return [item for i, item in enumerate(input_list) if item not in input_list[:i]]

print(remove_duplicates([3, 5, 2, 3, 8, 5]))

Output: [3, 5, 2, 8]

The list comprehension here includes an item in the result only if it has not appeared before in the list up to the current index.

Bonus One-Liner Method 5: Using sorted() and itertools.groupby()

This method is perfect for when you need a one-liner that removes duplicates while preserving order. It requires the list to be sorted and then uses `itertools.groupby()` to group the list into individual unique elements. Note that sorting the list can change the original order of elements.

Here’s an example:

from itertools import groupby

print([key for key, _ in groupby(sorted([3, 5, 2, 3, 8, 5]))])

Output: [2, 3, 5, 8]

Sorting the list and then using groupby() allows us to iterate over grouped items, picking out only the unique ones, though the original order is not preserved.

Summary/Discussion

Method 1: Using a for Loop. Simple and intuitive. Not the best performance for large lists.
Method 2: Using set(). Efficient and simple. Doesn’t preserve the order of elements.
Method 3: Using a Dictionary. Preserves order. Efficient for any size list.
Method 4: Using List Comprehensions. Concise and preserves order. Could be less efficient for large lists.
Method 5: Using sorted() and itertools.groupby(). One-liner solution. Does not preserve original order and requires elements to be sortable.