5 Best Ways to Find N-Sized Substrings with K Distinct Characters in Python

March 4, 2024 by Emily Rosemary Collins

💡 Problem Formulation: Given a string, the task is to find all possible substrings of size n that contain exactly k distinct characters. For example, if the input string is “abcba”, n is 3, and k is 2, the expected output would be [“abc”, “bcb”, “cba”] since each of these substrings of length 3 has exactly 2 distinct characters.

Method 1: Brute Force Approach

This method involves iterating over all possible substrings of length n and checking if they have k distinct characters. It’s straightforward but may not be efficient for long strings since its time complexity is O(n^2).

Here’s an example:

def n_sized_substrings_with_k_distinct(s, n, k):
    result = []
    for i in range(len(s) - n + 1):
        substring = s[i:i+n]
        if len(set(substring)) == k:
            result.append(substring)
    return result

print(n_sized_substrings_with_k_distinct("abcba", 3, 2))

Output: [‘abc’, ‘bcb’, ‘cba’]

The function n_sized_substrings_with_k_distinct() takes a string and integer values for n and k, iteratively checks all substrings of length n, and adds those with k distinct characters to the result list, which it then returns.

Method 2: Sliding Window Technique

The sliding window technique efficiently finds the substrings by moving a fixed-size window over the string. This reduces redundant checks and is better for large strings with a time complexity of O(n).

Here’s an example:

from collections import defaultdict

def substrings_of_size_n_with_k_distinct(s, n, k):
    distinct_count = 0
    char_count = defaultdict(int)
    result = []
    for i in range(len(s)):
        char_count[s[i]] += 1
        if char_count[s[i]] == 1:
            distinct_count += 1
        
        if i >= n:
            char_count[s[i-n]] -= 1
            if char_count[s[i-n]] == 0:
                distinct_count -= 1
        
        if i >= n-1 and distinct_count == k:
            result.append(s[i-n+1:i+1])
    return result

print(substrings_of_size_n_with_k_distinct("abcba", 3, 2))

Output: [‘abc’, ‘bcb’, ‘cba’]

This function uses a sliding window and a hashmap to keep track of the count of characters in the current window. When a new character is added, if it’s not already in the window, the distinct count is increased. When the window moves past its size n, the oldest character’s count is decreased. The result is collected when the window size is n and the distinct character count is k.

Method 3: Using Counter from Collections

This method utilizes the Counter class from the Python collections module to count the distinct elements in each possible substring of size n. It offers a clean and readable implementation.

Here’s an example:

from collections import Counter

def find_substrings(s, n, k):
    result = []
    for i in range(len(s) - n + 1):
        substring = s[i:i+n]
        if len(Counter(substring)) == k:
            result.append(substring)
    return result

print(find_substrings("abcba", 3, 2))

Output: [‘abc’, ‘bcb’, ‘cba’]

The function find_substrings() iterates over each index in the given string, slicing substrings of length n and creating a Counter object from them to count distinct characters. Substrings with exactly k distinct characters are added to the result list.

Method 4: Optimized Sliding Window with Early Stopping

An optimized version of Method 2 includes an early stopping mechanism that terminates the loop as soon as the number of distinct characters exceeds k, improving efficiency.

Here’s an example:

def optimized_substrings(s, n, k):
    char_count = defaultdict(int)
    result = []
    distinct_count = 0
    for i in range(len(s)):
        char_count[s[i]] += 1
        if char_count[s[i]] == 1:
            distinct_count += 1

        while distinct_count > k:
            char_count[s[i-n+1]] -= 1
            if char_count[s[i-n+1]] == 0:
                distinct_count -= 1
            if i >= n-1:
                result.pop()

        if i >= n-1 and distinct_count == k:
            result.append(s[i-n+1:i+1])
    return result

print(optimized_substrings("abcba", 3, 2))

Output: [‘abc’, ‘bcb’, ‘cba’]

The function optimized_substrings() uses a sliding window as well but includes a while loop to immediately reduce the window size and distinct count when the number of distinct characters exceeds k. This reduces the number of unnecessary computations.

Bonus One-Liner Method 5: Using List Comprehension and Set

For Python enthusiasts who love one-liners, this method provides a compact solution using list comprehension and sets to find the desired substrings.

Here’s an example:

def one_liner_substrings(s, n, k):
    return [s[i:i+n] for i in range(len(s)-n+1) if len(set(s[i:i+n])) == k]

print(one_liner_substrings("abcba", 3, 2))

Output: [‘abc’, ‘bcb’, ‘cba’]

The function one_liner_substrings() uses a list comprehension to iterate over the string, slicing substrings of length n and using a set to count distinct characters, checking if the count matches k.

Summary/Discussion

Method 1: Brute Force. Simple to understand and implement. Not efficient for large strings.
Method 2: Sliding Window Technique. More efficient than brute force, especially for large strings. Requires a good understanding of sliding windows.
Method 3: Using Counter. Clean and readable. Performance is not as good as the optimized sliding window for very large strings.
Method 4: Optimized Sliding Window with Early Stopping. Offers the best performance by stopping early when possible. Slightly more complex due to optimization.
Bonus One-Liner Method 5: Using List Comprehension and Set. Very concise. However, it may be less readable and harder to debug for beginners.