# 5 Best Ways to Count K-Length Substrings Occurring More Than Once in a Python String

Rate this post

π‘ Problem Formulation: You are given a string and are tasked to find and count the substrings of length `k` that appear more than once within that string. For example, in the string “ababc”, for `k=2`, the substring “ab” occurs twice; thus, the desired output is 1, indicating there is one such substring.

## Method 1: Brute Force with Substring Slicing

This traditional approach involves iterating through the string while slicing out substrings of length `k` and incrementing their count in a hash map. It’s a straightforward method but can be inefficient for larger strings due to its O(n^2) complexity.

Here’s an example:

```def count_k_length_substrings(s, k):
substring_count = {}
for i in range(len(s) - k + 1):
substring = s[i:i+k]
substring_count[substring] = substring_count.get(substring, 0) + 1
return sum(1 for count in substring_count.values() if count > 1)

print(count_k_length_substrings("ababc", 2))
```

Output:

`1`

This code defines a function `count_k_length_substrings` that iterates through the string `s`, extracts all possible substrings of length `k`, and stores their frequency in a dictionary `substring_count`. It then returns the count of such substrings that have more than one occurrence.

## Method 2: Using Collections’ Counter

The `Counter` class from Python’s Collections module can be used to count the occurrences of unique elements when passed an iterable. This method optimizes the data collection phase by using a specialized counting object.

Here’s an example:

```from collections import Counter

def count_k_length_substrings(s, k):
substrings = (s[i:i+k] for i in range(len(s) - k + 1))
counts = Counter(substrings)
return sum(1 for count in counts.values() if count > 1)

print(count_k_length_substrings("ababc", 2))
```

Output:

`1`

This snippet uses a generator expression to create all possible substrings of length `k` from the string `s`. The `Counter` object then tallies these substrings, and the sum is calculated for those with more than one occurrence, which is returned as the result.

## Method 3: Sliding Window Technique

This method enhances efficiency by using a sliding window to keep track of substring counts, essentially updating counts without re-slicing strings. It is more space and time-efficient, particularly for large strings.

Here’s an example:

```def count_k_length_substrings(s, k):
substring_count = {}
for i in range(len(s) - k + 1):
substring = s[i:i+k]
substring_count[substring] = substring_count.get(substring, 0) + 1

window_start = 0
unique_count = 0
for i in range(k, len(s)):
if substring_count[s[window_start:i]] > 1:
unique_count += 1
substring_count[s[window_start:i]] -= 1
window_start += 1
substring = s[i:i+k]
substring_count[substring] = substring_count.get(substring, 0) + 1
return unique_count

print(count_k_length_substrings("ababc", 2))
```

Output:

`1`

The code utilizes a sliding window to update the counts of substrings dynamically. It first counts all substrings of length `k` as before, then slides the window across the string, updating counts and thus avoiding redundant recounting.

## Method 4: Using DefaultDict for Counting

The `defaultdict` from Collections allows us to automatically initialize dictionary values, simplifying the counting logic. This method is a variation of the first method with a cleaner implementation.

Here’s an example:

```from collections import defaultdict

def count_k_length_substrings(s, k):
substring_count = defaultdict(int)
for i in range(len(s) - k + 1):
substring = s[i:i+k]
substring_count[substring] += 1
return sum(1 for count in substring_count.values() if count > 1)

print(count_k_length_substrings("ababc", 2))
```

Output:

`1`

Here, we use `defaultdict` to create a dictionary with a default integer value. This means we don’t need to check if a key exists before incrementing its value, making the code cleaner and slightly more efficient than a standard dictionary.

## Bonus One-Liner Method 5: Using List Comprehension and Set

A one-liner solution that uses a set for uniqueness combined with list comprehension to count substrings can be both terse and expressive.

Here’s an example:

```def count_k_length_substrings(s, k):
return len({s[i:i+k] for i in range(len(s) - k + 1) if s.count(s[i:i+k]) > 1})

print(count_k_length_substrings("ababc", 2))
```

Output:

`1`

This method uses a set comprehension to identify all unique substrings with a count greater than one in the string, and the length of the resultant set gives the desired count.

## Summary/Discussion

• Method 1: Brute Force with Substring Slicing. Simple to understand. Good for small strings. Inefficient for larger strings due to O(n^2) complexity.
• Method 2: Using Collections’ Counter. More Pythonic with built-in optimizations. Suitable for average-sized strings. Performance gain over brute force.
• Method 3: Sliding Window Technique. Highly efficient for larger strings. Reduces time complexity. Requires more complex logic to understand.
• Method 4: Using DefaultDict for Counting. Cleaner implementation of the brute force method. Avoids key existence checks. Can have a slight performance advantage over standard dictionaries.
• Bonus Method 5: Using List Comprehension and Set. Elegant one-liner. Less efficient for large strings due to repeated `count` operations. Best for quick small-scale tasks.