5 Best Ways to Find the Length of the Longest Substring with Two Distinct Elements in Python

Rate this post

πŸ’‘ Problem Formulation: In this article, we tackle the challenge of identifying the longest substring within a given string that contains no more than two unique characters. This problem often appears in coding interviews and algorithmic challenges. For instance, given the input string “aabcbcbb”, the desired output would be 5, referring to the substring “bcbbb” which consists of characters ‘b’ and ‘c’.

Method 1: Using a Sliding Window

The sliding window technique is an optimal solution for finding substrings in linear time. The idea is to move the window across the string while keeping track of the characters within the window using a dictionary. Functions used are max() to track the longest substring length, and a character count dictionary to maintain the count of distinct characters.

Here’s an example:

def longest_substring(s):
    left, max_length = 0, 0
    char_count = {}

    for right, char in enumerate(s):
        char_count[char] = char_count.get(char, 0) + 1
      
        while len(char_count) > 2:
            char_count[s[left]] -= 1
            if char_count[s[left]] == 0:
                del char_count[s[left]]
            left += 1
       
        max_length = max(max_length, right - left + 1)

    return max_length

print(longest_substring("aabcbcbb"))

Output:

5

This code snippet defines a function longest_substring that takes a string s and returns the length of the longest substring with up to two distinct characters. The algorithm uses two pointers to represent the boundaries of the sliding window and a character count dictionary to limit the window to two unique characters, resizing it when necessary and updating the longest length.

Method 2: Optimized Brute Force

The brute force approach checks all possible substrings and validates their length. Optimization involves breaking early when the condition of exceeding two distinct characters is met, saving unnecessary calculations. It uses nested loops and a set to count unique characters.

Here’s an example:

def longest_substring(s):
    max_length = 0
    
    for i in range(len(s)):
        unique_chars = set()
        current_length = 0
        
        for j in range(i, len(s)):
            if s[j] not in unique_chars and len(unique_chars) == 2:
                break
            unique_chars.add(s[j])
            current_length += 1
            
            max_length = max(max_length, current_length)
    return max_length

print(longest_substring("aabcbcbb"))

Output:

5

In this code snippet, we define a function longest_substring that iterates over each character in the string using nested loops. It starts a new substring from every character and extends it until it encounters more than two distinct characters. The length of such substrings is compared to find the longest one.

Method 3: Two-Pointer Technique without Dictionary

This method leverages a two-pointer approach but without using a dictionary for storing character counts. It directly keeps track of the last two distinct characters encountered and analyzes the maximum length as the loop continues. It involves comparisons and handling edge cases carefully.

Here’s an example:

def longest_substring(s):
    max_length = i = 0
    last, second_last = -1, -1
    
    for j in range(len(s)):
        if s[j] == s[last] or s[j] == s[second_last] or last == -1 or second_last == -1:
            max_length = max(max_length, j - i + 1)
        else:
            i = min(last, second_last) + 1
        
        if s[j] != s[last]:
            second_last = last
            last = j

    return max_length

print(longest_substring("aabcbcbb"))

Output:

5

The code defines function longest_substring which utilizes pointers last and second_last to keep track of the index of the last two distinct characters encountered. The window starts from index i and extends with valid substrings, updating the length dynamically, ensuring that no more than two distinct characters are ever considered.

Method 4: Dynamic Window with Collections Module

Using Python’s Collections module to implement a dynamic sliding window can streamline the process. The OrderedDict is particularly helpful for tracking the order of character arrivals within the window, allowing for efficient resizing when the third distinct character is encountered.

Here’s an example:

from collections import OrderedDict

def longest_substring(s):
    char_map = OrderedDict()
    max_length = start = 0
        
    for i, char in enumerate(s):
        if char in char_map:
            del char_map[char]
        char_map[char] = i
        char_map.move_to_end(char)
        
        if len(char_map) > 2:
            _, start = char_map.popitem(last=False)
            start += 1
            
        max_length = max(max_length, i - start + 1)

    return max_length

print(longest_substring("aabcbcbb"))

Output:

5

This code snippet creates a function longest_substring which keeps an ordered dictionary char_map that stores the characters as keys and their latest index as values. If more than two distinct characters are found, the oldest one is popped from the dictionary, and the window start index is updated. The maximum length is recorded throughout the iterations.

Bonus One-Liner Method 5: Using List Comprehension

A compact solution can be created using Python’s list comprehension and set, although it’s not the most efficient. This one-liner relies on generating all substrings and filtering them according to the distinct character condition, then returning the length of the longest valid substring.

Here’s an example:

print(max(len(sub) for i in range(len(s)) for j in range(i+1, len(s)+1) if len(set(s[i:j])) == 2))

Output:

5

This one-liner within a print function generates all possible substrings using a nested list comprehension, checks if each substring contains exactly two distinct characters with the set data structure, and then computes the length of these substrings to find the maximum.

Summary/Discussion

  • Method 1: Sliding Window. Efficient and scalable to larger inputs. Requires understanding of window resizing.
  • Method 2: Optimized Brute Force. Simpler to understand but not as efficient as method 1. Breaks early to avoid unnecessary computation.
  • Method 3: Two-Pointer without Dictionary. Memory efficient without extra data structures. Requires careful edge case handling and pointer management.
  • Method 4: Collections Module. Elegant and readable with OrderedDict’s help. Performance is generally good but potentially less efficient than simple dictionary due to reordering operations.
  • Method 5: List Comprehension. Quick to write and a single line. Not efficient due to considering all substrings, however, suitable for short strings or smaller datasets.