5 Best Ways to Remove Consecutive Duplicates in Python

Rate this post
5 Best Ways to Remove Consecutive Duplicates in Python

πŸ’‘ Problem Formulation: Consecutive duplicate removal in Python involves transforming a sequence (often strings or lists) by eliminating adjacent, repeating elements. For instance, given the input 'aaabbbcaaad', the desired output would be 'abcad'. The challenge is to efficiently process the sequence to achieve this result without altering the non-consecutive elements.

Method 1: Using Itertools.groupby()

Python’s Itertools library contains a function called groupby() which can be utilized for consecutive duplicate removal. This method groups adjacent identical elements, allowing for the construction of a deduplicated list or string. The function specification for groupby() is straightforward: it takes an iterable and returns keys and groups of consecutive identical elements.

Here’s an example:

from itertools import groupby

input_string = "aaabbbcaaad"
deduplicated = ''.join(k for k, g in groupby(input_string))

print(deduplicated)

Output:

abcad

This snippet uses groupby() from the itertools module to process the input_string. The comprehension iterates over keys and groups returned by groupby(), but only the keys (the individual characters) are used to construct the new, deduplicated string.

Method 2: Loop with Comparison

Another approach involves manually iterating over the sequence and comparing each element with the next one. Whenever two consecutive elements differ, the first is appended to the result. This is a basic algorithm that does not require any additional libraries.

Here’s an example:

input_string = "aaabbbcaaad"
result = input_string[0]

for i in range(1, len(input_string)):
    if input_string[i] != input_string[i - 1]:
        result += input_string[i]

print(result)

Output:

abcad

In this code, we initialize the result with the first character of the input. Then loop through the string starting from the second character, each time comparing the current character with the previous character and appending it to the result if they differ.

Method 3: Regular Expressions

Regular expressions are a powerful tool for string manipulation and can be applied to remove consecutive duplicates. The Python re module provides regular expression operations, and one can use the pattern (.)\\1+ to match adjacent duplicated characters.

Here’s an example:

import re

input_string = "aaabbbcaaad"
deduplicated = re.sub(r'(.)\\1+', r'\\1', input_string)

print(deduplicated)

Output:

abcad

This snippet makes use of the sub() method from the re module to replace all instances of a character followed by itself one or more times with just a single instance of that character, thus removing consecutive duplicates.

Method 4: Using Collections.deque

The deque (double-ended queue) from the collections module can be used for efficient consecutive duplicate removal. The deque allows appending and popping from both ends with equal performance, which can be handy in building the deduplicated sequence.

Here’s an example:

from collections import deque

input_string = "aaabbbcaaad"
result = deque()
for char in input_string:
    if not result or char != result[-1]:
        result.append(char)

print(''.join(result))

Output:

abcad

The code example initializes a deque object and iterates over each character in the input string. It appends a character to the deque only if the deque is empty or the character differs from the last one in the deque, effectively removing consecutive duplicates.

Bonus One-Liner Method 5: List Comprehension with zip()

The zip() function can be used in a creative one-liner that pairs each element with its subsequent one. The comparison within a list comprehension then filters out consecutive duplicates.

Here’s an example:

input_string = "aaabbbcaaad"
deduplicated = ''.join(a for a, b in zip(input_string, input_string[1:] + ' ') if a != b)

print(deduplicated)

Output:

abcad

This snippet uses zip() to create pairs of consecutive characters and a list comprehension to build a string including only the first character of a pair when the two characters differ.

Summary/Discussion

  • Method 1: Itertools.groupby(). Simple and elegant. Requires an understanding of itertools. May be less readable for those unfamiliar with the module.
  • Method 2: Loop with Comparison. Easy to understand and implement. Could be less efficient for very long sequences due to string concatenation within a loop.
  • Method 3: Regular Expressions. Extremely powerful and concise. Can be cryptic for those not well-versed in regex. Potentially less performant for simple cases.
  • Method 4: Collections.deque. Optimal for performance especially in cases with very long sequences. Uses extra space for the deque data structure.
  • Bonus Method 5: List Comprehension with zip(). An elegant one-liner. However, it might take longer to understand for Python beginners, and there’s a potential performance cost for creating pairs with zip().