5 Best Ways to Find All Occurrences of a Substring in a List of Strings with Python

πŸ’‘ Problem Formulation: You’ve encountered a situation where you need to find every occurrence of a particular substring within each string in a given list. For instance, if you’re given the list ['apple pie', 'banana pie', 'apple tart'] and you’re looking for the substring 'apple', the desired output would be a list of indexes or strings where the substring is found, such as [0, 2] or a modified list with the occurrences highlighted.

Method 1: Using a List Comprehension and the in Operator

This method involves iterating over the list of strings with a list comprehension and checking if the substring is present in each string with the in operator. The method is simple, readable, and efficient for smaller lists or short strings.

Here’s an example:

strings = ['apple pie', 'banana pie', 'apple tart']
substring = 'apple'
occurrences = [index for index, string in enumerate(strings) if substring in string]

print(occurrences)

Output:

[0, 2]

This snippet creates a new list called occurrences, using a list comprehension that enumerates over the original list strings. For each string where the substring 'apple' is found, the index of that string is added to occurrences.

Method 2: Using the filter() Function and lambda

The filter() function in conjunction with a lambda function filters the list based on whether the substring exists in each string. This functional approach is both elegant and concise but may be less intuitive for those unfamiliar with functional programming.

Here’s an example:

strings = ['apple pie', 'banana pie', 'apple tart']
substring = 'apple'
occurrences = list(filter(lambda x: substring in x, strings))

print(occurrences)

Output:

['apple pie', 'apple tart']

This code uses the filter() function to apply a lambda that checks whether the substring is in each element of strings. The result is converted into a list comprising just those strings that contain the substring.

Method 3: Using the find() Method

Employ the find() method to pinpoint the initial index at which the substring is detected within the strings. This technique is beneficial when the position within the individual strings is required along with the index.

Here’s an example:

strings = ['apple pie', 'banana pie', 'apple tart']
substring = 'apple'
occurrences = [(index, string.find(substring)) for index, string in enumerate(strings) if substring in string]

print(occurrences)

Output:

[(0, 0), (2, 0)]

By employing a list comprehension, this snippet not only checks for the presence of the substring using the in operator but also calls the find() method on the string to obtain the exact starting position of the substring.

Method 4: Using Regular Expressions with re

Regular expressions provide a powerful way to search for patterns. By using Python’s re module, you can find complex patterns within strings. This method is highly flexible but can be overkill for simple substring searches and is less performant on very large data sets.

Here’s an example:

import re

strings = ['apple pie', 'banana pie', 'apple tart']
substring = 'apple'
pattern = re.compile(substring)
occurrences = [string for string in strings if pattern.search(string)]

print(occurrences)

Output:

['apple pie', 'apple tart']

In this snippet, the re.compile() function is used to compile a regular expression pattern, which is then searched in each string using the pattern.search() method. This method returns the strings that match the pattern.

Bonus One-Liner Method 5: Using List Comprehension and str.count()

This concise one-liner uses a list comprehension to check the count of the substring in each string, assuming that knowing the number of occurrences is sufficient. Especially handy when you’re interested in counts and want a quick, readable solution.

Here’s an example:

strings = ['apple pie', 'banana pie', 'apple tart']
substring = 'apple'
occurrences = [string.count(substring) for string in strings]

print(occurrences)

Output:

[1, 0, 1]

This one-liner iterates through the list and for each string, it uses the str.count() method to count how many times the substring appears, outputting the counts in a new list.

Summary/Discussion

  • Method 1: List Comprehension with in. Strengths: Readable and straightforward. Weaknesses: Not suitable for complex patterns; cannot provide counts.
  • Method 2: filter() with lambda. Strengths: Elegant functional programming style. Weaknesses: Can be less readable; still no counts or indices.
  • Method 3: Using find(). Strengths: Provides exact position within strings. Weaknesses: Slightly more complicated; does not give the count of occurrences.
  • Method 4: Regular Expressions. Strengths: Extremely powerful and flexible for complex patterns. Weaknesses: Often overcomplicated for simple tasks; lower performance with large data sets.
  • Method 5: One-Liner with str.count(). Strengths: Very concise; provides counts directly. Weaknesses: Doesn’t provide positions or indices.