5 Best Ways to Remove Duplicates from a List of Strings in Python

💡 Problem Formulation: In Python development, it’s common to work with lists of strings where duplicates may occur. The objective is to transform a list such as ['apple', 'orange', 'apple', 'pear', 'orange', 'banana'] into a duplicate-free version, namely ['apple', 'orange', 'pear', 'banana']. This article explores five different methods to achieve this in an efficient and pythonic way.

Method 1: Using a Set to Remove Duplicates

This method takes advantage of the properties of a set to remove duplicates from a list of strings. A set is an unordered collection data type that is iterable, mutable, and has no duplicate elements. This method is straightforward and fast, as the set operations in Python are optimized for this purpose.

Here’s an example:

my_list = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
no_duplicates = list(set(my_list))
print(no_duplicates)

Output:

['banana', 'pear', 'orange', 'apple']

This code snippet takes the original list my_list, converts it to a set to remove duplicates, and then converts it back to a list. Keep in mind that this method will not maintain the original order of elements.

Method 2: Using List Comprehension and “not in”

List comprehension offers a concise way to create lists. Combined with a conditional check using “not in”, this method checks for duplicates as it builds a new list, maintaining the original order of elements.

Here’s an example:

my_list = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
no_duplicates = []
[no_duplicates.append(x) for x in my_list if x not in no_duplicates]
print(no_duplicates)

Output:

['apple', 'orange', 'pear', 'banana']

This code snippet iterates over each element in my_list, and it appends the element to no_duplicates only if it is not already present in that list, maintaining the order in which they first appeared.

Method 3: Using Dictionary Keys

Dictionaries in Python are inherently duplicate-free. By converting the list into a dictionary, where list elements become keys, duplicates are automatically removed. This method is similar to using a set but also maintains insertion order from Python 3.7 onwards.

Here’s an example:

my_list = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
no_duplicates = list(dict.fromkeys(my_list))
print(no_duplicates)

Output:

['apple', 'orange', 'pear', 'banana']

The code leverages dict.fromkeys() to create a new dictionary with the list items as keys, and then it immediately turns the dictionary back into a list, stripping out any duplicates and preserving order.

Method 4: Using a Function

For better code reusability, you can create a function that encapsulates the logic for removing duplicates from a list. This method is as flexible as it is Pythonic, promoting code readability and maintainability.

Here’s an example:

def remove_duplicates(my_list):
    return list(dict.fromkeys(my_list))

my_list = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
no_duplicates = remove_duplicates(my_list)
print(no_duplicates)

Output:

['apple', 'orange', 'pear', 'banana']

This snippet defines a function remove_duplicates() that accepts a list and returns a new list without duplicates by using the dictionary keys method described in Method 3.

Bonus One-Liner Method 5: Using itertools and groupby

The itertools.groupby() function is a powerful tool for grouping consecutive items, and when combined with a list comprehension, it can be used to remove adjacent duplicates from a sorted list.

Here’s an example:

import itertools

my_list = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
my_list.sort()
no_duplicates = [key for key, group in itertools.groupby(my_list)]
print(no_duplicates)

Output:

['apple', 'banana', 'orange', 'pear']

This one-liner first sorts my_list, then uses groupby() to group adjacent duplicates and extract the keys, effectively removing duplicates. Note that this changes the original order unless the list is sorted beforehand.

Summary/Discussion

Method 1: Using a Set. Fast and simple. Does not maintain order.
Method 2: List Comprehension. Maintains order. Not as efficient for large lists due to “not in” checks.
Method 3: Using Dictionary Keys. Maintains order and is efficient. Relies on Python 3.7 or later for order preservation.
Method 4: Using a Function. Promotes reusability and readability. Essentially wraps another method in a function.
Bonus Method 5: Using itertools and groupby. Compact code. Requires initial sorting for fully duplicate-free results.