5 Best Ways to Perform String Compression in Python

💡 Problem Formulation: String compression is a common programming challenge where the goal is to reduce the size of a string by replacing consecutive repeats of characters with the character followed by the count of repeats. For instance, the input 'aabcccccaaa' should yield an output of 'a2b1c5a3'.

Method 1: Using Loops

This method involves iterating through the string with a loop and building a compressed string by comparing each character to the next one. If the characters are the same, we increase the count; if they’re different, we append the character and its count to the result string and reset the count.

Here’s an example:

def compress_string(s):
    compressed = []
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            compressed.append(s[i - 1] + str(count))
            count = 1
    compressed.append(s[-1] + str(count))
    return ''.join(compressed)

print(compress_string('aabcccccaaa'))

Output: 'a2b1c5a3'

This code defines a function compress_string() that takes a string as an input and uses a loop to construct the compressed version. It checks adjacent characters and counts consecutive characters, appending them to a list once a different character is encountered, finally joining the list to form the compressed string.

Method 2: Using itertools.groupby

The itertools.groupby function can efficiently group consecutive elements in a string. By grouping identical characters together, we can easily compute the count of each group and construct the compressed string.

Here’s an example:

import itertools

def compress_string(s):
    return ''.join([char + str(sum(1 for _ in group)) for char, group in itertools.groupby(s)])

print(compress_string('aabcccccaaa'))

Output: 'a2b1c5a3'

This snippet uses the itertools.groupby() function to create an iterator that returns consecutive keys and groups from the string. We then iterate through these groups, construct a compressed format of the key and the length of the group, and join them into a resulting string.

Method 3: Using Regular Expressions

The regular expression (regex) library can find all consecutive characters. Using regex, we can match these groups and then build the compressed string by appending the character followed by the length of the match.

Here’s an example:

import re

def compress_string(s):
    return ''.join([match.group(0)[0] + str(len(match.group(0))) for match in re.finditer(r'(.)\1*', s)])

print(compress_string('aabcccccaaa'))

Output: 'a2b1c5a3'

The code uses the regex pattern '(.)\1*' to find all sequential repeats of characters in the string. The re.finditer() function is used to get an iterator over all non-overlapping matches in the string, each represented as a match object. The matches are processed to get the character and its consecutive count.

Method 4: Using Recursion

Recursion can be used for string compression by having a function call itself with smaller parts of the original string until the entire string has been compressed, combining the character count and compressing sub-parts as it backtracks.

Here’s an example:

def compress_string(s):
    if not s:
        return ""
    count = 1
    i = 1
    while i < len(s) and s[i] == s[i - 1]:
        i += 1
        count += 1
    return s[0] + str(count) + compress_string(s[i:])

print(compress_string('aabcccccaaa'))

Output: 'a2b1c5a3'

This function, compress_string(), calls itself with the remaining substring once it counts the consecutive characters. Each call deals with the first occurrence of consecutive characters, updates the count, then concatenates and passes on the remainder of the string to itself.

Bonus One-Liner Method 5: Using functools.reduce

The functools.reduce() function can be utilized to accumulate results along a sequence. It can compress a string by applying a function of two arguments cumulatively to the characters of the string from left to right.

Here’s an example:

from functools import reduce

def compress_string(s):
    return reduce(lambda acc, c: (acc[:-1] + acc[-1][0] + str(int(acc[-1][1:]) + 1)) if c == acc[-2] else acc + c + '1', s+' ', '')

print(compress_string('aabcccccaaa'))

Output: 'a2b1c5a3'

This one-liner uses functools.reduce() to traverse the string, keeping an accumulated result that appends a character and ‘1’ when the character is different from the last counted. When two characters match, it updates the count. This is a condensed, yet clever approach to string compression.

Summary/Discussion

Method 1: Using Loops. This approach is simple and intuitive. It works well for all sizes of input strings. However, it can be more verbose than other methods, and not as performant for very large strings.
Method 2: Using itertools.groupby. Groupby provides a very Pythonic way of solving the problem, with terse and readable code. The downside is it’s not as memory efficient since it requires importing an additional module.
Method 3: Using Regular Expressions. Regex is powerful for pattern matching but can be overkill for this problem. It’s also less readable for those not familiar with regex syntax, and potentially less performant due to the overhead of the regex engine.
Method 4: Using Recursion. Recursion is elegant but not always clear to beginners. It can also lead to a stack overflow for very large strings and is generally less efficient due to function call overhead.
Bonus Method 5: Using functools.reduce. A concise one-liner that is clever but may sacrifice readability. It’s more of a fun solution than a practical one, and may not be intuitive to those who aren’t familiar with functional programming concepts.