5 Best Ways to Remove Duplicate Characters from a String in Python

💡 Problem Formulation: When working with strings in Python, you might encounter situations where a string contains duplicate characters that you want to remove. For example, given the input string “aabbccdef”, you would want to remove the duplicates to get the output string “abcdef”.

Method 1: Using a For Loop

This method involves iterating over the characters in the string and building a new string by adding characters that aren’t already contained in it. This is an intuitive approach and easy to understand.

Here’s an example:

def remove_duplicates(input_string):
    result = ""
    for char in input_string:
        if char not in result:
            result += char
    return result

print(remove_duplicates("banana"))

Output: ‘ban’

This method creates a new string result, iterates over each character in the input string, checks if the character is not already in result, and appends it if it’s not. It’s straightforward but not the most efficient for long strings due to the inefficiency of string concatenation and the membership test in Python.

Method 2: Using a Dictionary

You can leverage the properties of a dictionary to automatically remove duplicates because dictionaries cannot have duplicate keys. This method uses a dictionary to track characters and their existence.

Here’s an example:

def remove_duplicates(input_string):
    return ''.join(dict.fromkeys(input_string))

print(remove_duplicates("programming"))

Output: ‘progamin’

The function utilizes dict.fromkeys() which creates a dictionary with keys from the input string, effectively removing any duplicates. We then join the keys that are left as they are unaffected by order insertion since Python 3.7.

Method 3: Using a Set and List

Sets are collections of unique elements in Python. You can convert the string to a set to remove duplicates and then sort it back to the original order using a list.

Here’s an example:

def remove_duplicates(input_string):
    return ''.join(sorted(set(input_string), key=input_string.index))

print(remove_duplicates("apple"))

Output: ‘aple’

This solution first converts the input string to a set to remove duplicates, then sorts them by the index they had in the original string, maintaining the original character order. This will be less efficient if maintaining the original order is not necessary.

Method 4: Using itertools and groupby

The groupby function from the Python itertools module can be used to group consecutive similar elements. We can use this to remove consecutive duplicate characters.

Here’s an example:

from itertools import groupby

def remove_duplicates(input_string):
    return ''.join(k for k, g in groupby(input_string))

print(remove_duplicates("mississippi"))

Output: ‘misisipi’

This method groups the input string by consecutive identical characters and extracts one character from each group. It only removes consecutive duplicate characters rather than all duplicates, which can be a strength or weakness depending on the use case.

Bonus One-Liner Method 5: Using Set Comprehension

This single-line approach utilizes a set comprehension within a join function to remove duplicate characters efficiently without maintaining order.

Here’s an example:

print(''.join({char for char in "radar"}))

Output: ‘rad’

This is perhaps the briefest method, using a set comprehension to remove duplicates. However, it does not preserve the original character order.

Summary/Discussion

Method 1: Using a For Loop. Easy to understand. Inefficient for long strings due to string concatenation and membership testing.
Method 2: Using a Dictionary. Efficient and preserves character order. Best for Python 3.7 and above where dictionary order is guaranteed.
Method 3: Using a Set and List. Removes duplicates while preserving original order. Less efficient if original order preservation is unnecessary.
Method 4: Using itertools and groupby. Good for removing consecutive duplicates. Does not remove non-consecutive duplicates.
Bonus Method 5: Using Set Comprehension. Very concise. Does not preserve order, which might be unacceptable for certain applications.