5 Best Ways to Remove Duplicate Words from a Sentence in Python

March 11, 2024 by Emily Rosemary Collins

💡 Problem Formulation: Duplicate words within a sentence can often detract from the clarity and conciseness of the message. This article delves into Python solutions for removing all instances of duplicate words from a given sentence. For example, the input “Python is great and Python is fun” should yield an output of “Python is great and fun”.

Method 1: Using a Dictionary

This method involves iterating through the words in the sentence and storing them in a dictionary to ensure that each word is unique. A dictionary in Python can automatically remove duplicate keys, hence using it ensures that we only keep the first occurrence of each word.

Here’s an example:

sentence = "Python is great and Python is fun"
words = sentence.split()
result = " ".join(dict.fromkeys(words))
print(result)

Output: Python is great and fun

In this example, splitting the sentence into words allows us to feed the words into a dictionary using dict.fromkeys(words). Dictionaries do not allow for duplicate keys; thus, duplicates are automatically removed. The join() method is then used to concatenate the remaining words back into a single string.

Method 2: Using the Ordered Dictionary (collections.OrderedDict)

The collections module’s OrderedDict can be used similarly to a standard dictionary, but it maintains the order of the keys as they were added. This is useful if we want to preserve the original word order after removing duplicates.

Here’s an example:

from collections import OrderedDict
sentence = "Python is great and Python is fun"
words = sentence.split()
result = " ".join(OrderedDict.fromkeys(words))
print(result)

Output: Python is great and fun

By using the OrderedDict.fromkeys() method from the collections module, we create an ordered dictionary that keeps the order of words as they appear in the sentence. Duplicate words are removed since keys in a dictionary are unique.

Method 3: Using List Comprehension and Set

List comprehension combined with a set is a more Pythonic and concise way to remove duplicates. A set is used to track unique words, while the list comprehension builds our list by adding words not already in the set.

Here’s an example:

sentence = "Python is great and Python is fun"
words = sentence.split()
seen = set()
result = " ".join([seen.add(word) or word for word in words if word not in seen])
print(result)

Output: Python is great and fun

The example shows how a set called seen stores unique words. The list comprehension checks if the word is not in seen before adding it to the result and the seen set simultaneously. This avoids adding duplicates and preserves the order.

Method 4: Using Filter and Lambda Function

This method applies a filter to our words to ensure only words that haven’t been encountered are kept. The lambda function within the filter provides a compact way of defining the logic for our conditional check.

Here’s an example:

sentence = "Python is great and Python is fun"
words = sentence.split()
seen = set()
result = " ".join(filter(lambda word: word not in seen and not seen.add(word), words))
print(result)

Output: Python is great and fun

The filter function iterates over each word, using a lambda to check if the word is in seen. If it isn’t, it adds the word to seen and returns True, so the word is included in the filtered list. The join() method then constructs our sentence without duplicates.

Bonus One-Liner Method 5: Utilizing a Generator Expression

For the most Pythonic and concise approach, a generator expression can be used. It performs similarly to the list comprehension method but is more memory-efficient on large datasets.

Here’s an example:

sentence = "Python is great and Python is fun"
result = " ".join(dict.fromkeys(sentence.split()))
print(result)

Output: Python is great and fun

This one-liner uses a generator expression within the dict.fromkeys() method. This is essentially a condensed version of Method 1, offering the same benefits of removing duplicates and maintaining initial word order with even more brevity.

Summary/Discussion

Method 1: Using a Dictionary. Simple and easy to understand. Maintains the original order of first occurrences but does not preserve the order for sequential duplicates.
Method 2: Using the Ordered Dictionary. Ensures both the removal of duplicates and preservation order. Slightly more complex and slower than using a regular dictionary.
Method 3: Using List Comprehension and Set. Pythonic and concise. Preserves the original order and is faster compared to the OrderedDict method, but can be less readable to those new to Python.
Method 4: Using Filter and Lambda Function. Functional approach, which may appeal to those familiar with functional programming. It could be less intuitive for beginners and slightly less performant than list comprehension.
Method 5: Utilizing a Generator Expression. The most concise and Pythonic approach. Most memory-efficient for large datasets but may sacrifice some readability for conciseness.