π‘ Problem Formulation: When working with a list of strings in Python, a common task is to filter the list based on specific patterns. Regular Expressions (regex) are a powerful way of defining these patterns, enabling complex matching criteria that can go well beyond simple substring checks.
Let’s see how we can use regex to filter a list of strings in Python.
Method 1: Using re.match and List Comprehension
The re.match
function is used to check if the string starts with the specified pattern. Pairing re.match
with list comprehension is a common and readable way to filter lists.
Here’s how you can use it:
import re strings = ["foo123", "bar", "baz123", "qux"] pattern = re.compile(r'^\d+') # regex to match strings starting with digits filtered_list = [s for s in strings if not pattern.match(s)]
This code sets up a compiled regex pattern that matches any string starting with digits. By using a list comprehension, we filter out any strings that match this pattern.
Method 2: Using re.search() and filter()
Another way is to use re.search()
that searches the entire string for the pattern. Combined with the built-in filter()
function, it can be applied to the list.
Here’s an example:
import re strings = ["foo", "123bar", "baz", "qux123"] pattern = re.compile(r'123') # regex to match strings containing '123' filtered_list = list(filter(lambda s: not pattern.search(s), strings))
In this snippet, pattern.search
looks for the raw string '123'
in each string, and filter
applies this pattern to remove matching strings, resulting in a list where none of the strings contain '123'
.
Method 3: Using re.fullmatch and a Function
If we need to check if the entire string strictly conforms to a pattern, re.fullmatch()
is our tool.
We can define a function to encapsulate our filtering logic:
import re def filter_strings(strings, regex): pattern = re.compile(regex) return [s for s in strings if not pattern.fullmatch(s)] strings = ["abc", "a1b2c3", "123", "xyz"] regex = r'\d+' # regex to match strings that are fully numeric filtered_list = filter_strings(strings, regex)

This function compiles the provided regex and filters the list using list comprehension. Only strings that aren’t fully numeric as per the regex remain.
Method 4: Precompiled Regex and Generators
For large data sets, using generators can save memory. Let’s pair a precompiled regex with a generator expression to filter our list:
import re strings = ["foo", "baz1", "2bar", "123foo"] pattern = re.compile(r'[^0-9]+') # regex to match strings without any digits filtered_list = (s for s in strings if pattern.fullmatch(s)) # Use the generator for valid_string in filtered_list: print(valid_string)
This method is memory-efficient as filtered_list
doesn’t actually hold the entire filtered data at once; it generates filtered items on-the-fly.
Method 5: Using re.findall and Custom Filtering Logic
At times, you might want to use the more versatile re.findall()
to customize your filtering criteria further. Below is a way to utilize this approach:
import re strings = ["hello123", "test", "world12345", "regex"] pattern = re.compile(r'123') def custom_filter(strings, pattern): return [s for s in strings if not pattern.findall(s)] filtered_list = custom_filter(strings, pattern)
In this case, custom_filter
function looks for instances of '123'
and returns a new list without strings containing that pattern.
Bonus One-Liner Method 6: Inline Regex with filter
Finally, for a quick, one-liner method, Python’s filter
function can be used with an inline regex and re.match
, like so:
import re strings = ["data1", "info100", "data", "statistics"] filtered_list = list(filter(lambda s: not re.match(r'.*\d', s), strings))
Here, we’re filtering out strings that end with a number by negating the regex match directly within filter
.