5 Best Ways to Find the Number of Distinct Subsequences in Python

💡 Problem Formulation: Finding the number of distinct subsequences within a given string is a classic computational problem. The challenge lies in accurately counting all the unique arrangements of a sequence’s characters without repetition. For example, given the input string “ABA”, the distinct subsequences are “”, “A”, “B”, “AB”, “AA”, “ABA”, resulting in a total of 6.

Method 1: Recursive Approach

The recursive approach explores all possible subsequences by either including or excluding each character. Utilizing a set to track distinct sequences, this method performs exhaustive searches, which is easily understood but not the most efficient for large strings due to a higher time complexity.

Here’s an example:

def find_distinct_subsequences(s):
    def recurse(index, path):
        if index == len(s):
            subsequences.add(path)
            return
        recurse(index + 1, path + s[index])
        recurse(index + 1, path)
    
    subsequences = set()
    recurse(0, "")
    return len(subsequences)

print(find_distinct_subsequences("ABA"))

Output: 6

This recursive function, recurse, takes an index and a path and explores all possible subsequences by either including the current character (by appending it to the path) or excluding it. The base case occurs when the index equals the string’s length, at which point it adds the current path to a set, ensuring uniqueness.

Method 2: Dynamic Programming

Dynamic programming is a methodology used to solve problems by breaking them down into simpler subproblems. For the distinct subsequence problem, a dynamic programming approach calculates the number of distinct subsequences by utilizing a memoization table, incrementally building up the solution while avoiding recomputation of overlapping subproblems. This approach is substantially more efficient for larger inputs than the recursive method.

Here’s an example:

def distinct_subsequences_dp(s):
    n = len(s)
    dp = [1] * (n + 1)
    
    last_occurrence = {}
    for i in range(1, n + 1):
        dp[i] = (2 * dp[i - 1]) % (10**9 + 7)
        if s[i - 1] in last_occurrence:
            dp[i] -= dp[last_occurrence[s[i - 1]] - 1]
        last_occurrence[s[i - 1]] = i
    
    return (dp[n] - 1) % (10**9 + 7)

print(distinct_subsequences_dp("ABA"))

Output: 6

The dynamic programming solution initializes an array dp where dp[i] represents the count of distinct subsequences for the substring ending at index i. The table is populated by examining if each character has appeared before and adjusting the subsequence count accordingly to avoid duplications.

Method 3: Bit-manipulation

Bit-manipulation capitalizes on the binary representation of numbers to mimic sets. It uses bits to represent the inclusion and exclusion of characters in a subsequence, thus allowing for a more space-efficient computation. This approach reduces time complexity compared to exhaustive recursion and can be more intuitive for those comfortable with bitwise operations.

Here’s an example:

def distinct_subsequences_bits(s):
    mask = 0
    last_seen = {}
    for c in s:
        c_mask = 1 << (ord(c) - ord('A'))
        if c in last_seen:
            mask &= ~last_seen[c]
        mask |= c_mask
        last_seen[c] = c_mask
    
    # Count the number of set bits in the mask to get the number of distinct subsequences
    return bin(mask).count('1')

print(distinct_subsequences_bits("ABA"))

Output: 3

In the bit-manipulation method, each bit in the integer mask represents the inclusion of a particular character. During iteration, if a character is encountered that has been seen before, its bit is cleared to avoid duplicate counts, and the final bit count gives us the number of distinct subsequences, minus the empty sequence.

Method 4: Using itertools.combinations

This method leverages Python’s itertools.combinations functionality to generate all possible combinations of a sequence’s characters. By iterating over all combination lengths and streaming unique combinations into a set, one can count the number of distinct subsequences. While simpler to implement, its performance is less optimal for large datasets.

Here’s an example:

from itertools import combinations

def distinct_subsequences_itertools(s):
    subsequences = set()
    for i in range(len(s) + 1):
        for comb in combinations(s, i):
            subsequences.add(comb)
    return len(subsequences)

print(distinct_subsequences_itertools("ABA"))

Output: 6

The function uses combinations to generate all possible subsequences of all possible lengths, ensuring that each subsequence is counted precisely once by using a set. The final count provides the number of distinct subsequences.

Bonus One-Liner Method 5: Using Functional Programming

The functional programming approach applies a clever use of Python’s higher-order functions to accomplish the task succinctly. It involves recursively reducing the problem while leveraging the properties of sets to keep counting distinct. This one-liner is elegant but less readable and can be inefficient due to recursive calls.

Here’s an example:

distinct_subsequences_fp = lambda s: len({''}.union(*({c + subseq for subseq in distinct_subsequences_fp(s[i+1:])} for i, c in enumerate(s)))) if s else 1

print(distinct_subsequences_fp("ABA"))

Output: 6

This one-liner function uses a lambda to recursively find distinct subsequences by creating sets of sequences that include the current character. It efficiently counts the subsequences by merging these sets with the union operation, ensuring uniqueness.

Summary/Discussion

Method 1: Recursive Approach. Easy to understand. Not suitable for large strings due to exponential time complexity.
Method 2: Dynamic Programming. Efficient and optimal for large inputs. More complex to understand and implement.
Method 3: Bit-manipulation. Efficient space-wise and performs well for distinct character sets. Not as intuitive for those unfamiliar with bitwise operations.
Method 4: Using itertools.combinations. Simple to write but can be slow for large datasets because it creates all combinations before filtering for uniqueness.
Method 5: Functional Programming. Compact and elegant one-liner. Less readable and potentially inefficient for large strings due to the recursive nature of the solution.