5 Best Ways to Find the Length of the Longest Common Subsequence of Three Strings in Python

Rate this post

πŸ’‘ Problem Formulation: The task is to devise a program that calculates the length of the longest common subsequence (LCS) shared by three given strings. For instance, given strings “abcdefgh”, “abefgk”, and “bcegkl”, a longest common subsequence is “beg”, with a length of 3. This article explores five different methods to solve this problem in Python.

Method 1: Recursive Approach

The Recursive Approach involves breaking down the problem into smaller instances of the same problem. It checks each character in the three strings and recursively finds the LCS by considering two scenarios: when the characters are part of LCS and when they are not. This method is defined by a function lcs_recursive(str1, str2, str3, i, j, k), where i, j, and k are the indices of the characters in str1, str2, and str3 respectively.

Here’s an example:

def lcs_recursive(str1, str2, str3, i, j, k):
    if i == 0 or j == 0 or k == 0:
        return 0
    elif str1[i-1] == str2[j-1] and str1[i-1] == str3[k-1]:
        return 1 + lcs_recursive(str1, str2, str3, i-1, j-1, k-1)
    else:
        return max(lcs_recursive(str1, str2, str3, i-1, j, k),
                   lcs_recursive(str1, str2, str3, i, j-1, k),
                   lcs_recursive(str1, str2, str3, i, j, k-1))

print(lcs_recursive("abcdefgh", "abefgk", "bcegkl", 8, 6, 6))

Output: 3

This code defines a recursive function that compares characters from the back of each string and either increments the count by 1 (if all the current characters match) or moves to the next set of characters. The complexity of this approach is high, making it less efficient for long strings.

Method 2: Memoization

Memoization improves the recursive approach by storing the results of subproblems to avoid redundant computations. This approach introduces a three-dimensional matrix for memoization and defines a function lcs_memoization(str1, str2, str3) that utilizes this matrix to store and reuse the length of LCS at specific indices.

Here’s an example:

def lcs_memoization(str1, str2, str3):
    l1, l2, l3 = len(str1), len(str2), len(str3)
    dp = [[[None for k in range(l3+1)] for j in range(l2+1)] for i in range(l1+1)]
    
    def lcs(i, j, k):
        if i == 0 or j == 0 or k == 0:
            return 0
        if dp[i][j][k] is not None:
            return dp[i][j][k]

        if str1[i-1] == str2[j-1] and str1[i-1] == str3[k-1]:
            dp[i][j][k] = 1 + lcs(i-1, j-1, k-1)
        else:
            dp[i][j][k] = max(lcs(i-1, j, k), lcs(i, j-1, k), lcs(i, j, k-1))
        return dp[i][j][k]

    return lcs(l1, l2, l3)

print(lcs_memoization("abcdefgh", "abefgk", "bcegkl"))

Output: 3

This snippet defines a helper function within lcs_memoization() to recurse through the string lengths. It uses a memoization table dp to store intermediate results. This method significantly reduces the time complexity as compared to the pure recursive approach.

Method 3: Dynamic Programming

Dynamic Programming (DP) builds upon memoization but uses iterative bottom-up computation. It removes recursion overhead by filling up a 3D DP table iteratively using the function lcs_dynamic_programming(str1, str2, str3). The table’s values represent the length of LCS up to those indices in the strings.

Here’s an example:

def lcs_dynamic_programming(str1, str2, str3):
    l1, l2, l3 = len(str1), len(str2), len(str3)
    dp = [[[0 for k in range(l3+1)] for j in range(l2+1)] for i in range(l1+1)]

    for i in range(1, l1+1):
        for j in range(1, l2+1):
            for k in range(1, l3+1):
                if str1[i-1] == str2[j-1] and str1[i-1] == str3[k-1]:
                    dp[i][j][k] = dp[i-1][j-1][k-1] + 1
                else:
                    dp[i][j][k] = max(dp[i-1][j][k], dp[i][j-1][k], dp[i][j][k-1])
    return dp[l1][l2][l3]

print(lcs_dynamic_programming("abcdefgh", "abefgk", "bcegkl"))

Output: 3

This function creates a DP table and iteratively computes the lengths of subsequences from indices (1,1,1) to (l1,l2,l3). It results in a LCS length without redundant calculations, thus it’s more time-efficient than the recursive with memoization approach.

Method 4: Space-Optimized DP

A Space-Optimized Dynamic Programming method reduces memory usage by keeping only the necessary parts of the DP table. This method uses two matrices that are alternated during computation. The function is defined as lcs_space_optimized(str1, str2, str3).

Here’s an example:

def lcs_space_optimized(str1, str2, str3):
    l1, l2, l3 = len(str1), len(str2), len(str3)
    current_dp = [[0 for k in range(l3+1)] for j in range(l2+1)]
    prev_dp = [[0 for k in range(l3+1)] for j in range(l2+1)]

    for i in range(1, l1+1):
        current_dp, prev_dp = prev_dp, current_dp
        for j in range(1, l2+1):
            for k in range(1, l3+1):
                if str1[i-1] == str2[j-1] and str1[i-1] == str3[k-1]:
                    current_dp[j][k] = prev_dp[j-1][k-1] + 1
                else:
                    current_dp[j][k] = max(prev_dp[j][k], current_dp[j-1][k], current_dp[j][k-1])
    return current_dp[l2][l3]

print(lcs_space_optimized("abcdefgh", "abefgk", "bcegkl"))

Output: 3

This code alternates between two 2D DP matrices to store the latest two layers of the 3D DP matrix, thereby saving space. Each layer corresponds to a slice of the matrix at a given index i. It’s more space-efficient than the full 3D DP while still maintaining fast computation time.

Bonus One-Liner Method 5: Recursive with LRU Cache

This method employs Python’s functools module to add a Least Recently Used (LRU) cache to the classic recursive approach. The function lcs_lru_cache(str1, str2, str3) uses LRU cache with the @lru_cache() decorator to cache the results of recursive calls. Although not a one-liner, the core recursive logic is concisely expressed.

Here’s an example:

from functools import lru_cache

@lru_cache(maxsize=None)
def lcs_lru_cache(str1, str2, str3, i, j, k):
    if i == 0 or j == 0 or k == 0:
        return 0
    if str1[i-1] == str2[j-1] and str1[i-1] == str3[k-1]:
        return 1 + lcs_lru_cache(str1, str2, str3, i-1, j-1, k-1)
    return max(lcs_lru_cache(str1, str2, str3, i-1, j, k),
               lcs_lru_cache(str1, str2, str3, i, j-1, k),
               lcs_lru_cache(str1, str2, str3, i, j, k-1))

str1, str2, str3 = "abcdefgh", "abefgk", "bcegkl"
print(lcs_lru_cache(str1, str2, str3, len(str1), len(str2), len(str3)))

Output: 3

By using the @lru_cache() decorator from the functools module, the recursive calls’ results are cached. This optimization greatly improves the execution time of the recursive solution without engaging in the complexity of manual memoization or DP.

Summary/Discussion

  • Method 1: Recursive Approach. Intuitive and straightforward. Not suitable for long strings due to exponential time complexity.
  • Method 2: Memoization. Optimizes the recursive approach by caching, improving performance. Still has overhead due to recursion.
  • Method 3: Dynamic Programming. Highly efficient in time due to iterative computation. Requires substantial memory for 3D DP table.
  • Method 4: Space-Optimized DP. Preserves time efficiency and reduces memory footprint. Slightly more complex logic due to alternating matrices.
  • Method 5: Recursive with LRU Cache. Combines the simplicity of recursion with caching, providing a good balance between readability and efficiency.