5 Best Ways to Find All Substrings Within a List of Strings in Python

πŸ’‘ Problem Formulation: We are often tasked with identifying subsets of text within a larger dataset. Specifically, in Python, the challenge might entail finding all strings within a list that are substrings of other strings in that list. For example, given the list [‘hello’, ‘hello world’, ‘ell’, ‘world’], we would expect to identify ‘hello’, ‘ell’, and ‘world’ as substrings contained within the list entries.

Method 1: Brute Force

The brute force method involves a straightforward approach whereby each string in the list is compared with every other string to check if it is a substring. This approach is simple to implement but can become inefficient with larger lists due to its O(n^2) complexity.

Here’s an example:

list_of_strings = ['hello', 'hello world', 'ell', 'world']
substrings = []

for s in list_of_strings:
    for other in list_of_strings:
        if s != other and s in other:
            substrings.append(s)
            break

print(substrings)

Output:

['hello', 'ell', 'world']

This code snippet iterates over each string in the list and compares it with every other string to check if it’s a substring. The result is a list of substrings found within the original list of strings without including any string twice.

Method 2: Using Sets

By leveraging sets, we can eliminate duplicates more efficiently and improve the overall performance as compared to the brute force method. This method still compares each string with others but does so using set operations for increased efficiency.

Here’s an example:

list_of_strings = ['hello', 'hello world', 'ell', 'world']
substrings = set()

for s in list_of_strings:
    if any(s in other for other in list_of_strings if s != other):
        substrings.add(s)

print(list(substrings))

Output:

['hello', 'ell', 'world']

This snippet uses a set to store substrings and a generator expression within any() to check for each string if it appears as a substring in any other strings of the list. This effectively and efficiently finds substrings without duplication.

Method 3: Using List Comprehensions

List comprehensions in Python offer a compact way to process all elements in a sequence and filter them based on a condition. This method condenses the operation of checking and collecting substrings into a single, readable line of Python code.

Here’s an example:

list_of_strings = ['hello', 'hello world', 'ell', 'world']
substrings = [s for s in list_of_strings if any(s in other for other in list_of_strings if s != other)]

print(substrings)

Output:

['hello', 'ell', 'world']

The provided code uses a list comprehension to iterate over each string in the list and includes it in the result if it is found as a substring in any of the other list’s strings, thus compiling a list of substrings.

Method 4: Using Filter and Lambda Functions

Filter function can be combined with lambda functions to identify substrings. This method is particularly expressive and aligns well with functional programming paradigms in Python.

Here’s an example:

list_of_strings = ['hello', 'hello world', 'ell', 'world']
substrings = list(filter(lambda s: any(s in other for other in list_of_strings if s != other), list_of_strings))

print(substrings)

Output:

['hello', 'ell', 'world']

This snippet applies a filter on the list of strings, passing a lambda function that checks if a string is a substring of any other strings in the list. The use of filter() and lambda provides a clear and concise method to obtain the same result.

Bonus One-Liner Method 5: Using Functional Programming

If you enjoy concise one-liners, Python’s functional programming features can deliver a solution to find substrings in a single line of code, albeit at the cost of potential readability for those unfamiliar with this style.

Here’s an example:

list_of_strings = ['hello', 'hello world', 'ell', 'world']
substrings = list({s for s in list_of_strings if any(s in other for other in list_of_strings if s != other)})

print(substrings)

Output:

['hello', 'ell', 'world']

By combining set and list comprehensions into a one-liner, we filter out duplicates and efficiently produce the list of substrings within the given list of strings.

Summary/Discussion

  • Method 1: Brute Force. Simple implementation but can be slow with large data sets due to nested loops.
  • Method 2: Using Sets. More efficient than brute force by eliminating duplicates and making use of faster set operations. Still not the most elegant solution.
  • Method 3: Using List Comprehensions. Clean and Pythonic way to find substrings, and is typically faster than the brute force method. It’s concise but may be less readable for new Python programmers.
  • Method 4: Using Filter and Lambda Functions. A functional programming approach that can be more expressive, and it reads almost like natural language.
  • Method 5: One-Liner Functional Programming. It’s the shortest and often the most efficient way to find substrings in terms of execution time, though it may compromise on readability for those less familiar with functional programming.