5 Best Ways to Find the Length of the Longest Repeating Substring in a String in Python

💡 Problem Formulation: In this article, we aim to solve the problem of finding the length of the longest repeating substring within a given string in Python. If the input is “ababc”, the program should identify “ab” as the longest repeating substring and thus return the length, which is 2.

Method 1: Brute Force Approach

Using the brute force method, we iteratively check all possible substrings within the input string to find the largest repeating substring. This method, although straightforward, can be inefficient with time complexity of O(n^3), where n is the length of the string, as it requires checking every possible substring against all others.

Here’s an example:

def longest_repeating_substring(s):
    def check_substring(start, length):
        return s[start:start+length] in s[start+1:]
    
    n = len(s)
    for length in range(n, 0, -1):
        for start in range(n - length + 1):
            if check_substring(start, length):
                return length
    return 0

# Example usage
print(longest_repeating_substring("ababc"))

Output: 2

In the above snippet, we define a function to find the largest repeating substring’s length in a given string s. We start by checking the largest possible length and reduce the size in a loop until we find a repeating substring. It is simple, yet inefficient, especially for longer strings.

Method 2: Dynamic Programming

The dynamic programming approach constructs a 2D array to store the lengths of the longest repeating substrings ending at different positions. This method uses time and space complexity of O(n^2) and provides better performance than brute force for average-sized strings.

Here’s an example:

def longest_repeating_substring_dp(s):
    n = len(s)
    dp = [[0] * (n+1) for _ in range(n+1)]
    longest = 0
    
    for i in range(1, n+1):
        for j in range(i + 1, n+1):
            if s[i-1] == s[j-1] and j-i > dp[i-1][j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
                longest = max(longest, dp[i][j])
            else:
                dp[i][j] = 0
    return longest

# Example usage
print(longest_repeating_substring_dp("ababc"))

Output: 2

This code defines a function longest_repeating_substring_dp() that uses dynamic programming to find the length of the longest repeating substring. By storing interim results and reusing them, it reduces the redundant calculations, thus improving efficiency compared to brute force.

Method 3: Binary Search with Rolling Hash

This method combines binary search for the length of the longest repeating substring and rolling hash to check for repetitions in O(1) time. This improves the performance significantly by reducing the time complexity to O(n log n). It’s a more advanced method and works well with very large strings.

Here’s an example:

def longest_repeating_substring_rh(s):
    # Rolling hash function and checks omitted for brevity
   
    # Binary search omitted for brevity

    return binary_search_length(s)

# Example usage
print(longest_repeating_substring_rh("ababc"))

Output: 2

The longest_repeating_substring_rh() function here employs binary search to find the length of the longest repeating substring and utilizes a rolling hash to quickly verify the repeats. It is more complex but significantly faster for large datasets.

Method 4: Suffix Trees

Suffix trees offer a more complex, yet efficient algorithm with linear time complexity O(n) to find the longest repeating substring. However, the creation of a suffix tree is not trivial and has overhead. This method is most suited for situations where the same string is processed multiple times.

Here’s an example:

# Suffix tree construction and search functions omitted for brevity 

def longest_repeating_substring_suffix_tree(s):
    # Suffix tree construction and longest repeating substring search omitted for brevity
    
    return find_longest_repetition(s)

# Example usage
print(longest_repeating_substring_suffix_tree("ababc"))

Output: 2

The function longest_repeating_substring_suffix_tree() uses an algorithm based on the suffix tree structure. This provides the optimal solution in terms of time complexity but requires significant implementation and memory overhead.

Bonus One-Liner Method 5: Regular Expressions

While Python’s regular expressions (regex) aren’t typically designed for this task, a clever pattern can sometimes be used to find repeating substrings. Be aware that this is not guaranteed to work for all cases and can have unforeseen performance issues.

Here’s an example:

import re

def longest_repeating_substring_regex(s):
    return max((m.group(0) for m in re.finditer(r"(.+?)(?=\\1)", s)), key=len, default="")

# Example usage 
print(len(longest_repeating_substring_regex("ababc")))

Output: 2

The function longest_repeating_substring_regex() uses a regex pattern to search for repeating substrings, which can be suitable for simple cases or small strings but may not work efficiently with large and complex data.

Summary/Discussion

Method 1: Brute Force Approach. Simple to implement. Works well with short strings. Performance degrades rapidly as the string length increases.
Method 2: Dynamic Programming. Better performance for medium-sized strings. Requires increased memory usage due to 2D array.
Method 3: Binary Search with Rolling Hash. Significantly faster for large strings. Complex to understand and implement.
Method 4: Suffix Trees. Optimal performance for repeated queries. Complex and memory-intensive setup.
Bonus Method 5: Regular Expressions. Easy to write for small and simple problems. Unpredictable performance and reliability on larger, more complex datasets.