5 Best Ways to Check a List for Duplicates in Python

5/5 - (1 vote)

Problem Formulation and Solution Overview

In this article, you’ll learn how to check a List for Duplicates in Python.

To make it more fun, we have the following running scenario:

The Finxter Academy has given you an extensive list of usernames. Somewhere along the line, duplicate entries were added. They need you to check if their List contains duplicates. For testing purposes, a small sampling of this List is used.

💬 Question: How would we write Python code to check a List for duplicate elements?

We can accomplish this task by one of the following options:

  • Method 1: Use set() and List to return a Duplicate-Free List
  • Method 2: Use set(), For loop and List to return a List of Duplicates found.
  • Method 3: Use a For loop to return Duplicates and Counts
  • Method 4: Use any() to check for Duplicates and return a Boolean
  • Method 5: Use List Comprehension to return a List of all Duplicates

Method 1: Use set() and List to return a Duplicate-Free List

This method uses set()which removes any duplicate values (set(users)) to produce a Duplicate-Free set(). This set is then converted to a List (list(set(users))).

users = ['AmyP', 'ollie3', 'shoeguy', 'kyliek', 'ollie3',
         'stewieboy', 'csealker', 'shoeguy', 'cdriver', 'kyliek']

dup_free  = list(set(users))
print(dup_free)

This code declares a small sampling of Finxter usernames and saves them to users.

Next, set() is called and users is passed as an argument to the same. Then, the new set is converted to a List and saved to dup_free.

If dup_free was output to the terminal before converting to a List, the result would be a set(), which is not subscriptable. Meaning the elements are inaccessible in this format.

Output

{'csealker', 'cdriver', 'shoeguy', 'ollie3', 'kyliek', 'stewieboy', 'AmyP'}

💡Note: Any attempt to access an element from a set will result in a not subscriptable error.

In this example, the set() was converted to a List, and displays a List of Duplicate-Free values.

Output

['csealker', 'cdriver', 'shoeguy', 'ollie3', 'kyliek', 'stewieboy', 'AmyP']

💡Note: An empty set will result if no argument is passed.


Method 2: Use set(), For loop, and List to return a List of Duplicates Found

This method uses set(), and a For loop to check for and return any Duplicates found (set(x for x in users if ((x in tmp) or tmp.add(x)))) to dups. The set() is then converted to a List (print(list(dups))).

Here’s an example:

users = ['AmyP', 'ollie3', 'shoeguy', 'kyliek', 'ollie3',
         'stewieboy', 'csealker', 'shoeguy', 'cdriver', 'kyliek']
         
tmp  = set()
dups = set(x for x in users if (x in tmp or tmp.add(x)))
print(list(dups))

This code declares a small sampling of Finxter usernames and saves them to users.

Next, a new empty set, tmp is declared. A For loop is then instantiated to check each element in users for duplicates. If a duplicate is found, it is appended to tmp. The results save to dups as a set().

Output

In this example, the set() was converted to a List and displays a List of Duplicates values found in the original List, users.

['kyliek', 'ollie3', 'shoeguy']

Method 3: Use a For loop to return Duplicates and Counts

This method uses a For loop to navigate through and check each element of users while keeping track of all usernames and the number of times they appear. A Dictionary of Duplicates, including the Usernames and Counts returns.

Here’s an example:

count = {}
dup_count = {}
for i in users:
    if i not in count:
        count[i] = 1
    else:
        count[i] += 1
        dup_count[i] = count[i]
print(dup_count)

This code declares two (2) empty sets, count and dup_count respectively.

A For loop is instantiated to loop through each element of users and does the following:

  • If the element i is not in count, then the count element (count[i]=1) is set to one (1).
  • If element i is found in count, it falls to else where one (1) is added to count (count[i]+=1) and then added to dup_count (dup_count[i]=count[i])

This code repeats until the end of users has been reached.

At this point, a Dictionary containing the Duplicates, and the number of times they appear displays.

Output

{'ollie3': 2, 'shoeguy': 2, 'kyliek': 2}

Method 4: Use Any to Check for Duplicate Values

This example uses any(), and passes the iterable users to iterate and locate Duplicates. If found, True returns. Otherwise, False returns. Best used on small Lists.

users = ['AmyP', 'ollie3', 'shoeguy', 'kyliek', 'ollie3',
         'stewieboy', 'csealker', 'shoeguy', 'cdriver', 'kyliek']

dups = any(users.count(x) > 1 for x in users)
print(dups)

This code declares a small sampling of Finxter usernames and saves them to users.

Next, any() is called and loops through each element of users checking to see if the element is a duplicate. If found, True is assigned. Otherwise, False is assigned. The result saves to dups and the output displays as follows:

Output

True

Method 5: Use List Comprehension to return a List of all Duplicates

This method uses List Comprehension to loop through users, checking for duplicates. If found, the Duplicates are appended to dups.

Here’s an example:

users = ['AmyP', 'ollie3', 'shoeguy', 'kyliek', 'ollie3',
         'stewieboy', 'csealker', 'shoeguy', 'cdriver', 'kyliek']

dups = [x for x in users if users.count(x) >= 2]
print(dups)

This code declares a small sampling of Finxter usernames and saves them to users.

Next, List Comprehension extracts and displays duplicate usernames and save them to a List. The duplicate values are output to the terminal

Output

['ollie3', 'shoeguy', 'kyliek', 'ollie3', 'shoeguy', 'kyliek']

Summary

These five (5) methods of checking a List for Duplicates should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!