Python Filter List Based on Substring (Contains)

πŸ’‘ Problem Formulation: When dealing with a list of strings in Python, you might encounter a situation where you need to filter the list based on the presence or absence of certain substrings.

Here is a succinct example:

β™₯️ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month

βœ… Given the list ['cat', 'carpet', 'castle', 'dog'] and the substring 'ca', the challenge is to produce an output which only contains the strings that include 'ca', i.e., ['cat', 'carpet', 'castle'].

Method 1: Using List Comprehension

List comprehension provides a compact and readable way to filter lists. It allows you to create a new list by applying an expression to each item in the original list.

Here’s a simple example:

words = ['cat', 'carpet', 'castle', 'dog']
substring = 'ca'
filtered_words = [word for word in words if substring in word]
print(filtered_words)

In the example above, the list comprehension iterates over each word in the words list and includes the word in the new list only if it contains the substring 'ca'.

πŸ‘‰ How to Filter a List in Python?

Method 2: Using the filter Function with a Lambda

The filter() function is used to create an iterator filtering out elements from the original list based on a function. When combined with a lambda function, it offers a functional approach to filtering lists.

words = ['cat', 'carpet', 'castle', 'dog']
substring = 'ca'
filtered_words = list(filter(lambda word: substring in word, words))
print(filtered_words)

This piece of code creates a filter object that only contains words including the 'ca' substring. The list() constructor converts the filter object into a list.

πŸ‘‰ Python Filter List of Strings Using a Wildcard

Method 3: Using a Regular Expression with the re Module

If your substring conditions are complex, regular expressions provide a powerful method for filtering strings. The re module in Python deals with regular expressions.

Here’s a simple example:

import re

words = ['cat', 'carpet', 'castle', 'dog']
pattern = re.compile('ca')
filtered_words = [word for word in words if pattern.search(word)]
print(filtered_words)

Here, re.compile('ca') compiles a regular expression pattern that is used to search within the strings. The list comprehension filters the list based on the presence of the pattern.

πŸ‘‰ 5 Ways to Filter a List of Strings Based on a Regex Pattern in Python

Method 4: Using a Function

Defining a custom function to check for the substring gives you full control and clarity over the filtering process; very useful for more complex conditions.

Here’s a simple example:

def contains_substring(substring, word_list):
    return [word for word in word_list if substring in word]

words = ['cat', 'carpet', 'castle', 'dog']
substring = 'ca'
filtered_words = contains_substring(substring, words)
print(filtered_words)

The function contains_substring performs the same task as in previous examples but adds readability by hiding the list comprehension within a function.

Method 5: Using the filter Function with str.contains()

In cases where you are dealing with pandas Series objects instead of lists, you can use the str.contains() method along with the filter() function to perform this operation in a more idiomatic way.

Here’s a simple example:

import pandas as pd

words = pd.Series(['cat', 'carpet', 'castle', 'dog'])
substring = 'ca'
filtered_words = words[words.str.contains(substring)]
print(filtered_words.tolist())

The words.str.contains(substring) returns a boolean series, which when used as an index to the words Series, filters out the matching items.

πŸ‘‰ Pandas Boolean Indexing

Bonus One-Liner Method 6: Using fnmatch Module

The fnmatch module provides support for Unix shell-style wildcards, which can be used to filter a list of strings.

Here’s a simple example:

from fnmatch import fnmatchcase

words = ['cat', 'carpet', 'castle', 'dog']
substring = '*ca*'
filtered_words = [word for word in words if fnmatchcase(word, substring)]
print(filtered_words)

In the example, fnmatchcase is used to match each word against the pattern 'ca', which has been converted into a Unix shell-style wildcard pattern.

Summary/Discussion

Filtering a list of strings based on a substring is a common task in Python.

Whether you prefer the succinctness of list comprehensions, the functional style of filter() and lambda, or the power of regular expressions, Python offers multiple methods to accomplish this.

Choosing the right method depends on your specific needs, readability, and performance considerations. The examples provided in this article give a practical starting point for implementing these methods.

Python One-Liners Book: Master the Single Line First!

Python programmers will improve their computer science skills with these useful one-liners.

Python One-Liners

Python One-Liners will teach you how to read and write “one-liners”: concise statements of useful functionality packed into a single line of code. You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.

The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.

Detailed explanations of one-liners introduce key computer science concepts and boost your coding and analytical skills. You’ll learn about advanced Python features such as list comprehension, slicing, lambda functions, regular expressions, map and reduce functions, and slice assignments.

You’ll also learn how to:

  • Leverage data structures to solve real-world problems, like using Boolean indexing to find cities with above-average pollution
  • Use NumPy basics such as array, shape, axis, type, broadcasting, advanced indexing, slicing, sorting, searching, aggregating, and statistics
  • Calculate basic statistics of multidimensional data arrays and the K-Means algorithms for unsupervised learning
  • Create more advanced regular expressions using grouping and named groups, negative lookaheads, escaped characters, whitespaces, character sets (and negative characters sets), and greedy/nongreedy operators
  • Understand a wide range of computer science topics, including anagrams, palindromes, supersets, permutations, factorials, prime numbers, Fibonacci numbers, obfuscation, searching, and algorithmic sorting

By the end of the book, you’ll know how to write Python at its most refined, and create concise, beautiful pieces of “Python art” in merely a single line.

Get your Python One-Liners on Amazon!!