5 Best Ways to Filter Supersequence Strings in Python

💡 Problem Formulation: We face a common challenge when we want to filter out supersequence strings from a list. A supersequence of a string is a sequence that contains the string as a subsequence. For instance, given a list of strings like [“apple”, “app”, “apricot”, “stone”], we want to retain “app” and “stone” because “apple” is a supersequence of “app” and “apricot” is unrelated.

Method 1: Using List Comprehensions with ‘all()’ Function

This method employs a list comprehension that iterates over all strings in the list and uses the all() function to ensure that no other string is a supersequence of the current string. It’s a Pythonic way to succinctly express the filter condition, making it well-suited for readability and conciseness.

Here’s an example:

strings = ["apple", "app", "apricot", "stone"]

filtered_strings = [
    s for s in strings if not any(s != other and s in other for other in strings)
]

print(filtered_strings)

Output:

['app', 'apricot', 'stone']

The list comprehension iterates through each string and checks it against all others in the list to ensure it’s not a subsequence. Strings that don’t match the condition of being a subsequence remain in the filtered list.

Method 2: Using a Custom Function

This method defines a custom function is_subsequence() to determine whether one string is a subsequence of another. The function helps to cleanly separate the concern of supersequence detection from the filtering operation, improving code maintainability and testability.

Here’s an example:

def is_subsequence(sub, superstr):
    it = iter(superstr)
    return all(char in it for char in sub)

strings = ["apple", "app", "apricot", "stone"]

filtered_strings = [
    s for s in strings if not any(s != other and is_subsequence(s, other) for other in strings)
]

print(filtered_strings)

Output:

['app', 'apricot', 'stone']

This code snippet defines a function is_subsequence() which uses an iterator to efficiently determine if one string is a subsequence of another. This function is then used within a list comprehension to filter out any supersequences.

Method 3: Using a Trie Data Structure

Building a Trie data structure allows efficient filtering of supersequence strings. The Trie represents all strings in a tree-like structure, making it simple to detect and eliminate supersequences with quick prefix checks. This method optimizes the operation for larger sets of strings.

Here’s an example:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

def insert(root, string):
    node = root
    for char in string:
        if char not in node.children:
            node.children[char] = TrieNode()
        node = node.children[char]
    node.is_end_of_word = True

def is_supersequence(root, string):
    node = root
    for char in string:
        if char not in node.children or node.is_end_of_word:
            return False
        node = node.children[char]
    return True

strings = ["apple", "app", "apricot", "stone"]
root = TrieNode()
for s in sorted(strings, key=len, reverse=True):
    if not is_supersequence(root, s):
        insert(root, s)

filtered_strings = [s for s in strings if not is_supersequence(root, s)]

print(filtered_strings)

Output:

['app', 'apricot', 'stone']

This code creates a Trie from the given strings and checks for supersequences during insertion. Strings that do not return as supersequences from the is_supersequence() function are added to the output list.

Method 4: Using Filter and Lambda Functions

Python’s built-in filter() function paired with a lambda allows for an inline, functional approach to filtering strings. Here, the filtering logic is similar to previous methods but put concisely in a single expression, harnessing the power of lambda functions.

Here’s an example:

strings = ["apple", "app", "apricot", "stone"]
filtered_strings = list(filter(
    lambda s: not any(s != other and s in other for other in strings), strings
))

print(filtered_strings)

Output:

['app', 'apricot', 'stone']

The lambda inside the filter() function is executed for each element in the list, effectively filtering out any strings that are subsequences of others. The resulting iterator is then converted back to a list.

Bonus One-Liner Method 5: Using set Difference

A one-liner method utilizes set difference to filter out subsequences by first building sets of indices representing potential subsequences and then subtracting them to get only the non-supersequences. This innovative and compact approach leverages Python’s robust set operations.

Here’s an example:

strings = ["apple", "app", "apricot", "stone"]

indices = {i for i, s in enumerate(strings)}
superseq_indices = {i for i, s1 in enumerate(strings) for j, s2 in enumerate(strings) if i != j and s1 in s2}
filtered_strings = [strings[i] for i in indices - superseq_indices]

print(filtered_strings)

Output:

['app', 'apricot', 'stone']

The method first creates a set of all indices and a set of indices that belong to supersequence strings. The difference of these sets gives the indices of the desired strings, which are then used to build the final filtered list.

Summary/Discussion

Method 1: List Comprehensions with ‘all()’ Function. It is straightforward and readable. However, in cases with large data sets, it may not be the most efficient.
Method 2: Custom Function. Separates concerns and improves code clarity, but can be slightly less efficient due to function call overheads.
Method 3: Trie Data Structure. Ideal for large data sets as it improves efficiency, but it comes with an increased complexity in implementation.
Method 4: Filter and Lambda Functions. Offers a functional approach and concise code but might be less intuitive for those unfamiliar with lambdas.
Method 5: Set Difference One-Liner. It is incredibly concise and leverages powerful set operations, but could be less readable due to its compactness.