5 Best Ways to Find the Length of the Longest Substring Containing K Distinct Characters in Python

Rate this post

πŸ’‘ Problem Formulation: Given a string, the challenge is to find the length of the longest substring that contains exactly k distinct characters. For instance, if the input is “abcba” and k is 2, the longest substrings with 2 distinct characters are “abc” and “bcb“, both with a length of 3. Hence, the desired output is 3.

Method 1: Brute Force Approach

Using the brute force method involves generating all possible substrings of the given string and then checking each substring for exactly k distinct characters. While this method is straightforward, it is not efficient for large strings due to its O(n^3) time complexity.

Here’s an example:

def longest_substring_brute_force(s, k):
    max_length = 0
    for start in range(len(s)):
        for end in range(start + 1, len(s) + 1):
            if len(set(s[start:end])) == k:
                max_length = max(max_length, end - start)
    return max_length

print(longest_substring_brute_force("abcba", 2))

Output: 3

This code defines a function longest_substring_brute_force() that receives a string and an integer k. It initializes max_length to 0 and iteratively explores all substrings, updating max_length if a substring of length k distinct characters is found. However, due to the nested loops, the performance can degrade sharply as the size of the input grows.

Method 2: Sliding Window Technique

This method improves the efficiency of the search by using a sliding window to maintain a dynamic set of characters. The window is expanded and shrunk to always include up to k distinct characters, tracking the maximum length sub-string found. This approach has an O(n) time complexity.

Here’s an example:

def longest_substring_sliding_window(s, k):
    char_map = {}
    left, max_length = 0, 0
    for right in range(len(s)):
        char_map[s[right]] = char_map.get(s[right], 0) + 1
        while len(char_map) > k:
            char_map[s[left]] -= 1
            if char_map[s[left]] == 0:
                del char_map[s[left]]
            left += 1
        max_length = max(max_length, right - left + 1)
    return max_length

print(longest_substring_sliding_window("abcba", 2))

Output: 3

In the longest_substring_sliding_window() function, we create a dictionary char_map to count character occurrences. The variables left and right represent the window edges. The loop expands the window and adjusts char_map. If the distinct character count exceeds k, the left edge of the window moves up until the condition is met, continuously checking for the maximum length. This method is practical for larger datasets due to its linear time complexity.

Method 3: Optimized Sliding Window with OrderedDict

This method builds upon the sliding window technique but uses Python’s collections.OrderedDict to efficiently keep track of the order in which characters appear. This allows us to remove the oldest entry quickly when we exceed k distinct characters, potentially improving performance in some scenarios.

Here’s an example:

from collections import OrderedDict

def longest_substring_ordered_dict(s, k):
    char_map = OrderedDict()
    max_length = 0
    
    for char in s:
        if char in char_map:
            del char_map[char]
        char_map[char] = None
        if len(char_map) > k:
            char_map.popitem(last=False)
        max_length = max(max_length, len(char_map))

    return max_length

print(longest_substring_ordered_dict("abcba", 2))

Output: 3

The function longest_substring_ordered_dict() illustrates an optimized strategy for the sliding window method using an OrderedDict. This approach leverages the ordered nature of OrderedDict to quickly discard the oldest character when needed. The function ensures that at most k distinct characters are kept in the char_map, maintaining the maximum length of the substring throughout the iteration.

Method 4: Frequency Table with Two Pointers

This method uses a frequency table to track character counts and two pointers to manage the window’s boundaries. One key difference is the explicit use of the frequency table to ensure quick updates and lookups of character counts within the window.

Here’s an example:

def longest_substring_two_pointers(s, k):
    char_freq = {}
    left = result = 0
    
    for right, char in enumerate(s):
        char_freq[char] = char_freq.get(char, 0) + 1
        
        while len(char_freq) > k:
            char_freq[s[left]] -= 1
            if char_freq[s[left]] == 0:
                del char_freq[s[left]]
            left += 1
        
        result = max(result, right - left + 1)
    
    return result

print(longest_substring_two_pointers("abcba", 2))

Output: 3

The longest_substring_two_pointers() function uses a dictionary char_freq to count the characters and two pointers (left and right) to adjust the window size. Similar to the sliding window technique, it iterates over the string, updating counts and checking the number of distinct characters. When the character limit is exceeded, it reduces the count of the leftmost character and shifts the left pointer rightwards, constantly updating the result.

Bonus One-Liner Method 5: Using itertools and Set

A more Pythonic, albeit less efficient, one-line solution using itertools. This method leverages Python’s comprehensive standard library to express the solution very succinctly at the cost of potentially increased time complexity.

Here’s an example:

from itertools import combinations

def longest_substring_oneliner(s, k):
    return max((len(s[a:b]) for a, b in combinations(range(len(s) + 1), 2) if len(set(s[a:b])) == k), default=0)

print(longest_substring_oneliner("abcba", 2))

Output: 3

The longest_substring_oneliner() expression here makes use of Python’s generator expressions and the combinations function from the itertools module to generate all possible substrings. Due to the combination of range and set operations, it checks for the distinct character condition and finds the maximum length. As noted, this method is concise but may not be practical for long strings.

Summary/Discussion

  • Method 1: Brute Force Approach. Simple to understand and implement. Not suitable for large strings due to its O(n^3) time complexity.
  • Method 2: Sliding Window Technique. Significantly more efficient with linear time complexity. Suitable for most practical scenarios.
  • Method 3: Optimized Sliding Window with OrderedDict. Adds a potential performance boost for certain cases. Efficiently manages character order within the window.
  • Method 4: Frequency Table with Two Pointers. Similar benefits to the sliding window technique. The explicit frequency table can simplify the understanding and manipulation of character counts.
  • Method 5: Using itertools and Set. Offers a Pythonic one-liner. Best used when code conciseness is preferred over performance, especially with shorter strings.