5 Best Ways to Find Duplicates in a List of Strings in Python

💡 Problem Formulation: When working with lists in Python, a common issue you may encounter is the need to identify duplicate strings. For instance, given an input list – ['apple', 'banana', 'cherry', 'apple', 'date', 'banana'], the goal is to find a way to output the duplicates – ['apple', 'banana']. This article describes five effective methods for detecting these duplicates in a list of strings.

Method 1: Using a Loop and a Dictionary

A traditional way to find duplicates involves iterating over the list and storing the count of each element in a dictionary. This method is straightforward and does not require any additional libraries.

Here’s an example:

strings = ['apple', 'banana', 'cherry', 'apple', 'date', 'banana']
duplicates = {}

for string in strings:
    duplicates[string] = duplicates.get(string, 0) + 1
duplicates = [string for string, count in duplicates.items() if count > 1]

print(duplicates)

Output:

['apple', 'banana']

This piece of code iterates through the list, using the get method to update the dictionary with the count of each string. Afterward, it uses a list comprehension to generate a list of elements that have a count greater than 1.

Method 2: Using Collections.Counter

The collections module provides a specialized Counter class that makes finding duplicates very concise. Its purpose is to support convenient and rapid tallies.

Here’s an example:

from collections import Counter

strings = ['apple', 'banana', 'cherry', 'apple', 'date', 'banana']
duplicates = [item for item, count in Counter(strings).items() if count > 1]

print(duplicates)

Output:

['apple', 'banana']

This snippet creates a Counter object from the list, which automatically counts the number of occurrences of each string. Then a list comprehension is used to extract the items with a count greater than one.

Method 3: Using Sets for Comparison

Set operations can be used to identify duplicates by comparing the length of the list to the length of a set (which removes duplicates) of the list.

Here’s an example:

strings = ['apple', 'banana', 'cherry', 'apple', 'date', 'banana']
duplicates = set([string for string in strings if strings.count(string) > 1])

print(duplicates)

Output:

{'banana', 'apple'}

This code first accepts a list comprehension that includes elements which appear more than once and converts it to a set to eliminate any duplicates among the duplicates themselves. The resulting set contains only the unique duplicates.

Method 4: Using List Comprehension and enumerate()

This method utilizes list comprehension to find duplicates by checking if the element appears later in the list using enumerate().

Here’s an example:

strings = ['apple', 'banana', 'cherry', 'apple', 'date', 'banana']
duplicates = list(set([string for index, string in enumerate(strings) if string in strings[index+1:]]))

print(duplicates)

Output:

['banana', 'apple']

This code iterates over each item in the list with its index and adds the item to the duplicates list if it appears again in the list beyond the current index. The results are converted to a set and back to a list to remove any duplicates of duplicates.

Bonus One-Liner Method 5: Using a Functional Approach

A functional approach using the filter() function can elegantly extract duplicates with a one-liner.

Here’s an example:

strings = ['apple', 'banana', 'cherry', 'apple', 'date', 'banana']
duplicates = set(filter(lambda x: strings.count(x) > 1, strings))

print(duplicates)

Output:

{'apple', 'banana'}

The filter() function is used here to keep elements that appear more than once in the list. As before, a set is used to ensure all duplicates are unique.

Summary/Discussion

Method 1: Loop with Dictionary. Strength – Simple, without third-party libraries. Weakness – May not be as efficient with large lists due to explicit looping.
Method 2: Collections.Counter. Strength – Clean and Pythonic. Weakness – Requires importing Collections; more overhead with small lists.
Method 3: Using Sets for Comparison. Strength – Easy to understand. Weakness – Ineficient due to the use of list.count() in each iteration.
Method 4: List Comprehension with enumerate(). Strength – Compact and no extra imports. Weakness – Could be less efficient since it creates intermediate lists.
Method 5: Functional One-Liner. Strength – Condensed code. Weakness – Readability might suffer for those not familiar with functional programming concepts.