5 Best Ways to Filter a List of Strings in Python Using Regex

💡 Problem Formulation: When working with lists of strings in Python, it’s common to want to filter the list so that it only contains strings that match a certain pattern. Regular expressions (regex) can be used to perform this filtering in a flexible way. For example, if you have a list of file names, you might want to find all files with a .txt extension. The desired output is a list that contains only the strings that end with “.txt”.

Method 1: Using the `re` module with `filter()`

Python’s re module provides regular expression support for filtering lists. By combining the re.match function with the built-in filter() function, we can apply a regex pattern to a list of strings, keeping only those that match.

Here’s an example:

import re
file_names = ["report.txt", "image.png", "notes.txt", "graph.pdf"]
pattern = re.compile(r'.*\.txt$')
filtered_files = list(filter(pattern.match, file_names))

Output:

['report.txt', 'notes.txt']

This code snippet compiles a regex pattern that matches any string ending with “.txt”. It then filters the list of file names, returning only those that match this pattern. The filter() function applies the pattern’s match method to each element in the list and constructs an iterator of the matching elements, which is then converted back to a list.

Method 2: List Comprehension with `re.search()`

A more Pythonic way to filter lists is using list comprehensions. By using re.search() inside a list comprehension, we perform regex filtering succinctly and efficiently.

Here’s an example:

import re
file_names = ["report.txt", "image.png", "notes.txt", "graph.pdf"]
filtered_files = [name for name in file_names if re.search(r'.*\.txt$', name)]

Output:

['report.txt', 'notes.txt']

The list comprehension iterates over each string in the list and includes it in the new list if re.search() finds a match for the pattern. It’s a more concise method compared to using filter() and provides the same results.

Method 3: Using `re.findall()`

The re.findall() function is typically used to return all non-overlapping matches of a pattern in a string, as a list of strings. When filtering a list of strings, re.findall() can be used within a list comprehension to check for the presence of any matches.

Here’s an example:

import re
file_names = ["report.txt", "image.png", "notes.txt", "graph.pdf"]
filtered_files = [name for name in file_names if re.findall(r'.*\.txt$', name)]

Output:

['report.txt', 'notes.txt']

This approach is similar to Method 2 but uses re.findall() instead of re.search(). The list comprehension filters the original list, including those strings for which re.findall() returns a non-empty list, indicating a match.

Method 4: Using `lambda` and `filter()`

For cases where you want more control over the filtering function, you can use a lambda function in combination with the filter() function. Lambda functions allow custom inline expressions without defining a separate function.

Here’s an example:

import re
file_names = ["report.txt", "image.png", "notes.txt", "graph.pdf"]
filtered_files = list(filter(lambda name: re.match(r'.*\.txt$', name), file_names))

Output:

['report.txt', 'notes.txt']

The lambda function takes each file name and applies re.match() with the specified pattern. The filter() function then constructs a list of file names for which the lambda function returns True. This method offers flexibility and readability for more complex filtering criteria.

Bonus One-Liner Method 5: Using `fnmatch.filter()`

For simpler patterns that don’t require full regex capabilities, the fnmatch module provides a filter function that matches using Unix shell-style wildcards. While not a true regex solution, it is handy for basic patterns.

Here’s an example:

import fnmatch
file_names = ["report.txt", "image.png", "notes.txt", "graph.pdf"]
filtered_files = fnmatch.filter(file_names, '*.txt')

Output:

['report.txt', 'notes.txt']

This one-liner uses fnmatch.filter() to directly filter the list using a wildcard pattern. It’s a quick and easy solution when the regex-like pattern matching isn’t necessary, but be aware that it doesn’t support the full power of regular expressions.

Summary/Discussion

Method 1: Using the re module with filter(). Strengths: leverages built-in functions, readable. Weaknesses: requires compiling the regex pattern.
Method 2: List Comprehension with re.search(). Strengths: Pythonic, concise. Weaknesses: might be less readable for beginners.
Method 3: Using re.findall(). Strengths: operates similarly to search, simple to understand. Weaknesses: can be less efficient for large lists as it returns all matches.
Method 4: Using lambda and filter(). Strengths: offers custom inline expressions, flexible. Weaknesses: can decrease readability with complex expressions.
Method 5: Using fnmatch.filter(). Strengths: simplicity, good for basic patterns. Weaknesses: not a full regex implementation, limited pattern matching.

Method 1: Using the re module with filter()

Method 2: List Comprehension with re.search()

Method 3: Using re.findall()

Method 4: Using lambda and filter()

Bonus One-Liner Method 5: Using fnmatch.filter()

Summary/Discussion

Method 1: Using the `re` module with `filter()`

Method 2: List Comprehension with `re.search()`

Method 3: Using `re.findall()`

Method 4: Using `lambda` and `filter()`

Bonus One-Liner Method 5: Using `fnmatch.filter()`