The question reveals that there may be some gap in understanding the basics of Python’s regular expression library.
How to match an exact word or string using a regular expression in Python?
So if you’re an impatient person, here’s the short answer:
To match an exact string
'hello' partially in
'hello world', use the simple regex
'hello'. However, a simpler and more Pythonic approach would be using the
in keyword within membership expression
'hello' in 'hello world'.
For a full match, use the start and end symbols
'^hello$' that would not match the string
'hello world' but it would match
Feel free to play the tutorial video as you go over the article.
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
So far so good. But let’s dive into some more specific questions—because you may not exactly have looked for this simplistic answer.
In fact, there are multiple ways of understanding your question and I have tried to find all interpretations and answered them one by one in this tutorial:
- How to check membership of a word in a string using no library?
- How to match an exact string using Python’s regex library?
- How to match a word in a string using word boundaries
- How to match a word in a string (case insensitive)?
- How to find all occurrences of a word in a string?
- How to find all lines containing an exact word?
Let’s dive into each of them in the remaining article to learn and improve your regex superpowers!
How to Check Membership of a Word in a String (Python Built-In)?
To match an exact string
'hello' in a string such as
'hello world', use the
in keyword within membership expression
'hello' in 'hello world'.
This is the simple answer, you’ve already learned.
Instead of matching an exact string, it’s often enough to use Python’s
in keyword to check membership. As this is a very efficient built-in functionality in Python, it’s much faster, more readable, and doesn’t require external dependencies.
Thus, you should rely on this method if possible:
>>> 'hello' in 'hello world' True
The first example shows the most straightforward way of doing it: simply ask Python whether a string is “in” another string. This is called the membership operator and it’s very efficient.
You can also check whether a string does not occur in another string.
>>> 'hi' not in 'hello world' True
The negative membership operator
s1 not in s2 returns
True if string
s1 does not occur in string
But there’s a problem with the membership operator. The return value is a Boolean value.
So let’s explore the problem of exact string matching using the regex library next:
How to Match an Exact String (Regex)?
To match an exact string using Python’s regex library
re, use the string as a regex. For example, you can call
re.search('hello', 'hello world') to match the exact string
'hello' in the string
'hello world' and return a match object.
Here’s how you can match an exact substring in a given string:
>>> import re >>> re.search('hello', 'hello world') <re.Match object; span=(0, 5), match='hello'>
After importing Python’s library for regular expression processing
re, you use the
re.search(pattern, string) method to find the first occurrence of the
pattern in the
💡 Related Tutorial: If you’re unsure about the
re.search() method, check out my detailed tutorial on this blog.
This returns a match object that wraps a lot of useful information such as the start and stop matching positions and the matching substring.
As you’re looking for exact string matches, the matching substring will always be the same as your searched word.
But wait, there’s another problem: you wanted an exact match, right?
Using the previous approach does not help because you’re getting prefix matches of your searched word:
>>> re.search('good', 'goodbye') <re.Match object; span=(0, 4), match='good'>
When searching for the exact word
'good' in the string
'goodbye' it actually matches the prefix of the word.
Is this what you wanted? If not, read on:
How to Match a Word in a String (Word Boundary \b)?
An exact match of a word will also retrieve matching substrings that occur anywhere in the string.
Here’s an example:
>>> 'no' in 'nobody knows' True
And another example:
>>> re.search('see', 'dfjkyldsssseels') <re.Match object; span=(10, 13), match='see'>
What if you want to match only whole words—not exact substrings?
The answer is simple:
To match whole exact words, use the word boundary metacharacter
'\b'. This metacharacter matches at the beginning and end of each word—but it doesn’t consume anything. In other words, it simply checks whether the word starts or ends at this position (by checking for whitespace or non-word characters).
Here’s how you use the word boundary character to ensure that only whole words match:
>>> import re >>> re.search(r'\bno\b', 'nobody knows') >>> >>> re.search(r'\bno\b', 'nobody knows nothing - no?') <re.Match object; span=(23, 25), match='no'>
In both examples, you use the same regex
'\bno\b' that searches for the exact word
'no' but only if the word boundary character
'\b' matches before and after.
In other words, the word
'no' must appear on its own as a separate word. It is not allowed to appear within another sequence of word characters.
As a result, the regex doesn’t match in the string
'nobody knows' but it matches in the string
'nobody knows nothing - no?'.
Note that we use raw string
r'...' to write the regex so that the escape sequence
'\b' works in the string.
- Without the raw string, Python would assume that it’s an unescaped backslash character
'\', followed by the character
- With the raw string, all backslashes will just be that: backslashes. The regex engine then interprets the two characters as one special metacharacter: the word boundary
But what if you don’t care whether the word is uppercase, lowercase, or capitalized? In other words:
How to Match a Word in a String (Case Insensitive)?
You can search for an exact word in a string—but ignore capitalization. This way, it’ll be irrelevant whether the word’s characters are lowercase or uppercase.
>>> import re >>> re.search('no', 'NONONON', flags=re.IGNORECASE) <re.Match object; span=(0, 2), match='NO'> >>> re.search('no', 'NONONON', flags=re.I) <re.Match object; span=(0, 2), match='NO'> >>> re.search('(?i)no', 'NONONON') <re.Match object; span=(0, 2), match='NO'>
All three ways are equivalent: they all ignore the capitalization of the word’s letters.
💡 Related Tutorial: If you need to learn more about the
flags argument in Python, check out my detailed tutorial on this blog.
The third example uses the in-regex flag
(?i) that also means: “ignore the capitalization”.
How to Find All Occurrences of a Word in a String?
Okay, you’re never satisfied, are you? So let’s explore how you can find all occurrences of a word in a string.
In the previous examples, you used the
re.search(pattern, string) method to find the first match of the
pattern in the
Next, you’ll learn how to find all occurrences (not only the first match) by using the
re.findall(pattern, string) method.
💡 Related Tutorial: You can also read my blog tutorial about the
findall() method that explains all the details.
>>> import re >>> re.findall('no', 'nononono') ['no', 'no', 'no', 'no']
Your code retrieves all matching substrings.
If you need to find all match objects rather than matching substrings, you can use the
re.finditer(pattern, string) method:
>>> for match in re.finditer('no', 'nonononono'): print(match) <re.Match object; span=(0, 2), match='no'> <re.Match object; span=(2, 4), match='no'> <re.Match object; span=(4, 6), match='no'> <re.Match object; span=(6, 8), match='no'> <re.Match object; span=(8, 10), match='no'> >>>
re.finditer(pattern, string) method creates an iterator that iterates over all matches and returns the match objects. This way, you can find all matches and get the match objects as well.
How to Find All Lines Containing an Exact Word?
Say you want to find all lines that contain the word
'42' from a multi-line string in Python. How’d you do it?
The answer makes use of a fine Python regex specialty: the dot regex matches all characters, except the newline character. Thus, the regex
'.*' will match all characters in a given line (but then stop).
Here’s how you can use this fact to get all lines that contain a certain word:
>>> import re >>> s = '''the answer is 42 the answer: 42 42 is the answer 43 is not''' >>> re.findall('.*42.*', s) ['the answer is 42', 'the answer: 42', '42 is the answer']
Three out of four lines contain the word
findall() method returns these as strings.
How to Find All Lines Not Containing an Exact Word?
In the previous section, you’ve learned how to find all lines that contain an exact word.
In this section, you’ll learn how to do the opposite: find all lines that NOT contain an exact word.
This is a bit more complicated. I’ll show you the code first and explain it afterwards:
import re s = '''the answer is 42 the answer: 42 42 is the answer 43 is not the answer 42''' for match in re.finditer('^((?!42).)*$', s, flags=re.M): print(match) ''' <re.Match object; span=(49, 58), match='43 is not'> <re.Match object; span=(59, 69), match='the answer'> '''
You can see that the code successfully matches only the lines that do not contain the string
How can you do it?
The general idea is to match a line that doesn’t contain the string ‘
42', print it to the shell, and move on to the next line. The
re.finditer(pattern, string) accomplishes this easily by returning an iterator over all match objects.
The regex pattern
'^((?!42).)*$' matches the whole line from the first position
'^' to the last position
Related Tutorial: If you need a refresher on the start-of-the-line and end-of-the-line metacharacters, read this 5-min tutorial.
In between, you match an arbitrary number of characters: the asterisk quantifier does that for you.
Which characters do you match? Only those where you don’t have the negative word
'42' in your lookahead.
Related Tutorial: If you need a refresher on lookaheads, check out this tutorial.
As the lookahead itself doesn’t consume a character, we need to consume it manually by adding the dot metacharacter
. which matches all characters except the newline character
Related Tutorial: As it turns out, there’s also a blog tutorial on the dot metacharacter.
Finally, you need to define the
re.MULTILINE flag, in short:
re.M, because it allows the start
^ and end
$ metacharacters to match also at the start and end of each line (not only at the start and end of each string).
Together, this regular expression matches all lines that do not contain the specific word
Where to Go From Here?
Summary: You’ve learned multiple ways of matching an exact word in a string.
- You can use the simple Python membership operator.
- You can use a default regex with no special metacharacters.
- You can use the word boundary metacharacter
'\b'to match only whole words.
- You can match case-insensitive by using the flags argument
- You can match not only one but all occurrences of a word in a string by using the
- And you can match all lines containing and not containing a certain word.
Pheww. This was some theory-heavy stuff. Do you feel like you need some more practical stuff next?
Then check out my practice-heavy Python freelancer course that helps you prepare for the worst and create a second income stream by creating your thriving coding side-business online.
💡 Click: https://blog.finxter.com/become-python-freelancer-course/ to learn more about the Finxter freelancer course and start your thriving coding business online (side income or full-time).
Also, you may enjoy this course on the Finxter computer science academy:
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet:
Note all courses are available for free for Finxter Freelancer Course students!
Q: How do you tell an introverted computer scientist from an extroverted computer scientist? A: An extroverted computer scientist looks at your shoes when he talks to you.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.