# 5 Best Ways to Find the Smallest Substring Containing a Specific String in Python

π‘ Problem Formulation: You are given two strings, a larger string and a query string. Your task is to find the smallest substring within the larger string that contains all characters of the query string. For instance, given the string “xyyzyzyx” and the query “xyz”, you want the shortest substring that includes at least one ‘x’, one ‘y’, and one ‘z’. The desired output in this case would be “zyx”.

## Method 1: Using Two Pointers and a Hashtable

This method involves iterating over the larger string with two pointers, expanding and contracting the window of the substring, and keeping track of the characters using a hashtable. The key is to find the minimum length window that contains all characters of the query string. This approach is efficient for longer strings as it ensures that each character is visited only once.

Here’s an example:

```from collections import defaultdict

def min_window_substring(s, t):
if not t or not s:
return ""

dict_t = defaultdict(int)
for char in t:
dict_t[char] += 1

required = len(dict_t)
l, r = 0, 0
formed = 0
window_counts = defaultdict(int)

ans = float("inf"), None, None

while r < len(s):
character = s[r]
window_counts[character] += 1

if character in dict_t and window_counts[character] == dict_t[character]:
formed += 1

while l <= r and formed == required:
character = s[l]

if r - l + 1 < ans[0]:
ans = (r - l + 1, l, r)

window_counts[character] -= 1
if character in dict_t and window_counts[character] < dict_t[character]:
formed -= 1

l += 1

r += 1

return "" if ans[0] == float("inf") else s[ans[1] : ans[2] + 1]

print(min_window_substring("xyyzyzyx", "xyz"))```

Output:

`zyx`

In the code snippet above, we define a function `min_window_substring` that takes two parameters, `s` (the larger string) and `t` (the query string). We use a two-pointer approach to expand and contract the current window `s[l:r+1]` and update the window’s character count in a hashtable. Whenever the window has all the characters of the query string (`formed == required`), we check if the current window length is smaller than the previously found smallest window and update the answer accordingly.

## Method 2: Brute Force with Nested Loops

In the brute force approach, we use nested loops to explore every possible substring of the larger string and check if it contains all characters of the query string. Although this method is straightforward, it has a high time complexity and is not suitable for long strings or large character sets.

Here’s an example:

```def contains_all_chars(subs, query):
for char in query:
if char not in subs:
return False
return True

def brute_force_substring(s, t):
min_len = float("inf")
min_sub = ""

for i in range(len(s)):
for j in range(i, len(s)):
subs = s[i:j+1]
if contains_all_chars(subs, t) and len(subs) < min_len:
min_len = len(subs)
min_sub = subs
return min_sub

print(brute_force_substring("xyyzyzyx", "xyz"))```

Output:

`zyx`

The above example implements a simple brute force solution starting with a function `contains_all_chars` that checks if all characters in `query` are in the substring `subs`. The function `brute_force_substring` iterates through every possible substring of `s` and employs `contains_all_chars` to verify if a substring contains the query string. If it does, and if the substring’s length is less than the current minimum, we update `min_len` and `min_sub` to hold the new smallest substring.

## Method 3: Optimized Brute Force with Early Stopping

To slightly optimize the brute force approach, we can incorporate an early stopping mechanism. This involves breaking out of the inner loop once we encounter a substring that contains all the characters of the query string, since any longer substring starting at the same point will not be a candidate for the shortest substring.

Here’s an example:

```from itertools import combinations

def optimized_brute_force(s, t):
if not s or not t:
return ""

for length in range(1, len(s) + 1):
for start in range(len(s) - length + 1):
subs = s[start:start+length]
if all(char in subs for char in t):
return subs
return ""

print(optimized_brute_force("xyyzyzyx", "xyz"))```

Output:

`zyx`

In this code, the `optimized_brute_force` function iterates over the lengths of potential substrings starting from 1 and ending at the length of the input string `s`. For each length, we check all substrings of that size and use the built-in `all()` function to determine if all characters from the query string `t` are present in the current substring. This method is optimized because it stops as soon as it finds the smallest substring rather than continuing to check unnecessary longer substrings.

## Method 4: Sliding Window with Character Count Optimization

The sliding window technique can also be optimized by keeping count of the number of characters from the query string found in the current window. This variation reduces the number of times we need to iterate over the entire window to check for the presence of all query string characters.

Here’s an example:

```from collections import Counter

def sliding_window_optimized(s, t):
if not t or not s:
return ""

dict_t = Counter(t)
required = len(dict_t)
l, r = 0, 0
formed = 0
window_counts = defaultdict(int)

ans = float("inf"), None, None

while r < len(s):
character = s[r]
window_counts[character] += 1

if character in dict_t and window_counts[character] == dict_t[character]:
formed += 1

while l <= r and formed == required:
character = s[l]
if r - l + 1 < ans[0]:
ans = (r - l + 1, l, r)
window_counts[character] -= 1
if character in dict_t and window_counts[character] < dict_t[character]:
formed -= 1
l += 1

r += 1

return "" if ans[0] == float("inf") else s[ans[1]:ans[2] + 1]

print(sliding_window_optimized("xyyzyzyx", "xyz"))```

Output:

`zyx`

Similar to Method 1, this version of the sliding window algorithm uses a `Counter` from the `collections` module for `dict_t` and keeps a running count of window characters in `window_counts`. It improves on Method 1 by potentially reducing the number of window characters we need to count, as it only counts characters from the query string. When a valid window is found, it checks if it’s the smallest so far and updates the answer accordingly. It then adjusts the counts and `formed` variable while collapsing the window from the left.

## Bonus One-Liner Method 5: Using Regular Expressions

If you appreciate concise and potentially “clever” code, you can attempt to use Python’s regex library, though it is generally not as efficient as the sliding window algorithm and may not handle all edge cases. Nevertheless, utilizing backreferences and lookahead assertions can lead to a one-liner solution.

Here’s an example:

```import re

def regex_substring(s, t):
regex = f"(?=(.*{'.*'.join(sorted(set(t)))}.*))"
matches = [match.group(1) for match in re.finditer(regex, s) if match.group(1)]
return min(matches, key=len) if matches else ""

print(regex_substring("xyyzyzyx", "xyz"))```

Output:

`zyx`

The function `regex_substring` constructs a regular expression pattern that uses lookahead assertions to find all substrings that contain all characters in the query string (the characters are sorted and deduplicated first). The regular expression `(?=(.*{'.*'.join(sorted(set(t)))}.*))` ensures that all characters of the query are in the substring. It returns the shortest match from the list of all matches or an empty string if no match is found.

## Summary/Discussion

• Method 1: Two Pointers and a Hashtable. This is a highly efficient and commonly recommended approach for finding the smallest substring. The space complexity increases with the number of unique characters in the query string.
• Method 2: Brute Force with Nested Loops. Simple and straightforward but has a significant time complexity that makes it impractical for large strings or sets of characters.
• Method 3: Optimized Brute Force with Early Stopping. A small improvement over the plain brute force approach that can reduce the search space by stopping early, but still suffers from high time complexity with larger inputs.
• Method 4: Sliding Window with Character Count Optimization. An enhanced version of the sliding window technique that can be more efficient by focusing only on the characters of the query string.
• Method 5: Using Regular Expressions. A one-liner that is elegant but not always efficient or comprehensive for all possible nuances of the problem, and it may struggle with large input strings or complex character sets.