π‘ Problem Formulation: When working with a list of strings in Python, a common task is to filter the list based on specific patterns. Regular Expressions (regex) are a powerful way of defining these patterns, enabling complex matching criteria that can go well beyond simple substring checks.
Let’s see how we can use regex to filter a list of strings in Python.
Method 1: Using re.match and List Comprehension
The re.match function is used to check if the string starts with the specified pattern. Pairing re.match with list comprehension is a common and readable way to filter lists.
Here’s how you can use it:
import re strings = ["foo123", "bar", "baz123", "qux"] pattern = re.compile(r'^\d+') # regex to match strings starting with digits filtered_list = [s for s in strings if not pattern.match(s)]
This code sets up a compiled regex pattern that matches any string starting with digits. By using a list comprehension, we filter out any strings that match this pattern.
Method 2: Using re.search() and filter()
Another way is to use re.search() that searches the entire string for the pattern. Combined with the built-in filter() function, it can be applied to the list.
Here’s an example:
import re strings = ["foo", "123bar", "baz", "qux123"] pattern = re.compile(r'123') # regex to match strings containing '123' filtered_list = list(filter(lambda s: not pattern.search(s), strings))
In this snippet, pattern.search looks for the raw string '123' in each string, and filter applies this pattern to remove matching strings, resulting in a list where none of the strings contain '123'.
Method 3: Using re.fullmatch and a Function
If we need to check if the entire string strictly conforms to a pattern, re.fullmatch() is our tool.
We can define a function to encapsulate our filtering logic:
import re
def filter_strings(strings, regex):
pattern = re.compile(regex)
return [s for s in strings if not pattern.fullmatch(s)]
strings = ["abc", "a1b2c3", "123", "xyz"]
regex = r'\d+' # regex to match strings that are fully numeric
filtered_list = filter_strings(strings, regex)
This function compiles the provided regex and filters the list using list comprehension. Only strings that aren’t fully numeric as per the regex remain.
Method 4: Precompiled Regex and Generators
For large data sets, using generators can save memory. Let’s pair a precompiled regex with a generator expression to filter our list:
import re
strings = ["foo", "baz1", "2bar", "123foo"]
pattern = re.compile(r'[^0-9]+') # regex to match strings without any digits
filtered_list = (s for s in strings if pattern.fullmatch(s))
# Use the generator
for valid_string in filtered_list:
print(valid_string)This method is memory-efficient as filtered_list doesn’t actually hold the entire filtered data at once; it generates filtered items on-the-fly.
Method 5: Using re.findall and Custom Filtering Logic
At times, you might want to use the more versatile re.findall() to customize your filtering criteria further. Below is a way to utilize this approach:
import re
strings = ["hello123", "test", "world12345", "regex"]
pattern = re.compile(r'123')
def custom_filter(strings, pattern):
return [s for s in strings if not pattern.findall(s)]
filtered_list = custom_filter(strings, pattern)In this case, custom_filter function looks for instances of '123' and returns a new list without strings containing that pattern.
Bonus One-Liner Method 6: Inline Regex with filter
Finally, for a quick, one-liner method, Python’s filter function can be used with an inline regex and re.match, like so:
import re strings = ["data1", "info100", "data", "statistics"] filtered_list = list(filter(lambda s: not re.match(r'.*\d', s), strings))
Here, we’re filtering out strings that end with a number by negating the regex match directly within filter.
