5 Best Ways to Find the Number of Unique People from a List of Contact Mail IDs in Python

💡 Problem Formulation: When managing a list of email contacts, it’s essential to determine the number of unique individuals in the collection. Consider a raw input list that includes multiple email addresses, possibly with duplicates: ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]. The desired output is the count of unique email IDs, in this case, 2, representing john.doe@example.com and jane.smith@sample.org.

Method 1: Using a Set for Uniqueness

This method capitalizes on the unique property of Python’s set. When a list is converted into a set, all the duplicates are automatically removed. This method is straightforward and efficient, especially for large lists, as the conversion to a set is a very fast operation in Python.

Here’s an example:

emails = ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]
unique_emails = set(emails)
unique_count = len(unique_emails)
print(unique_count)

Output: 2

This example turns the list emails into a set unique_emails, removing any duplicates. The length of the unique set is then counted using len(), providing the number of unique email addresses.

Method 2: List Comprehension with Dictionary Keys

Python dictionaries can’t have duplicate keys. Using this characteristic, one can iterate over the list and add items as keys to the dictionary. This will inherently avoid duplication because dictionaries will only keep unique keys. This is an indirect way of achieving uniqueness and is especially useful for large datasets.

Here’s an example:

emails = ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]
unique_emails = list({}.fromkeys(emails))
unique_count = len(unique_emails)
print(unique_count)

Output: 2

The dictionary method fromkeys() is used here to create a dictionary with email addresses as keys and None as values, then converting the keys back into a list, yielding a list of unique email addresses.

Method 3: Iterative Approach

An iterative method involves creating a new list and adding items to it only if they haven’t been added before. This manual method provides fine-grained control over the process and is quite simple to understand. However, its performance might degrade with very large lists.

Here’s an example:

emails = ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]
unique_emails = []

for email in emails:
    if email not in unique_emails:
        unique_emails.append(email)

print(len(unique_emails))

Output: 2

Here, we iterate through each email ID in the emails list. If an email ID is not already in the unique_emails list, it gets appended, ensuring the list only contains unique items.

Method 4: Using Counter from Collections

The Counter class from Python’s collections module can also be used for counting unique elements. While it’s primarily made to count occurrences, simply taking the length of its output can provide the number of unique elements.

Here’s an example:

from collections import Counter

emails = ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]
unique_emails_count = len(Counter(emails))
print(unique_emails_count)

Output: 2

The Counter object instantiates with the list of email addresses, and then the length of this object is taken, counting the unique email addresses.

Bonus One-Liner Method 5: Using a Functional Approach with map() and lambda

A functional one-liner approach utilizes the map() function together with a lambda to quickly identify unique emails by transforming each email into a set. This is a concise and clever use of Python’s functional programming capabilities, yet can be less readable to those unfamiliar with the concepts.

Here’s an example:

emails = ["john.doe@example.com", "jane.smith@sample.org", "john.doe@example.com"]
unique_count = len(set(map(lambda x: x, emails)))
print(unique_count)

Output: 2

The lambda function in this one-liner essentially does nothing (it returns the input as-is), which when coupled with map(), forms a set with quick elimination of duplicates. This one-liner achieves the same effect as Method 1.

Summary/Discussion

Method 1: Using a Set for Uniqueness. Fast and effective. Best for general use.
Method 2: List Comprehension with Dictionary Keys. Creative use of dictionary properties. Good for large sets of data.
Method 3: Iterative Approach. Simple and easy to understand. Performance may decrease with very large lists.
Method 4: Using Counter from Collections. Utilizes built-in library functions. More for counting than deduplication, but effective.
Method 5: Functional Approach with map() and lambda. Concise one-liner. Potentially confusing for beginners.