How to Match an Exact Word in Python Regex? (Answer: Don’t)

5/5 - (2 votes)

The question reveals that there may be some gap in understanding the basics of Python’s regular expression library.

How to match an exact word or string using a regular expression in Python?

So if you’re an impatient person, here’s the short answer:

To match an exact string 'hello' partially in 'hello world', use the simple regex 'hello'. However, a simpler and more Pythonic approach would be using the in keyword within membership expression 'hello' in 'hello world'.

For a full match, use the start and end symbols '^hello$' that would not match the string 'hello world' but it would match 'hello'.

How to Match an Exact Word in Python Regex? (Answer: Donโ€™t)

Feel free to play the tutorial video as you go over the article.

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

So far so good. But let’s dive into some more specific questions—because you may not exactly have looked for this simplistic answer.

In fact, there are multiple ways of understanding your question and I have tried to find all interpretations and answered them one by one in this tutorial:

  • How to check membership of a word in a string using no library?
  • How to match an exact string using Python’s regex library?
  • How to match a word in a string using word boundaries \b?
  • How to match a word in a string (case insensitive)?
  • How to find all occurrences of a word in a string?
  • How to find all lines containing an exact word?

Let’s dive into each of them in the remaining article to learn and improve your regex superpowers!

How to Check Membership of a Word in a String (Python Built-In)?

To match an exact string 'hello' in a string such as 'hello world', use the in keyword within membership expression 'hello' in 'hello world'.

This is the simple answer, you’ve already learned.

Instead of matching an exact string, it’s often enough to use Python’s in keyword to check membership. As this is a very efficient built-in functionality in Python, it’s much faster, more readable, and doesn’t require external dependencies.

Thus, you should rely on this method if possible:

>>> 'hello' in 'hello world'

The first example shows the most straightforward way of doing it: simply ask Python whether a string is “in” another string. This is called the membership operator and it’s very efficient.

You can also check whether a string does not occur in another string.

Here’s how:

>>> 'hi' not in 'hello world'

The negative membership operator s1 not in s2 returns True if string s1 does not occur in string s2.

But there’s a problem with the membership operator. The return value is a Boolean value.

However, the advantage of Python’s regular expression library re is that it returns a match object which contains more interesting information such as the exact location of the matching substring.

So let’s explore the problem of exact string matching using the regex library next:

How to Match an Exact String (Regex)?

To match an exact string using Python’s regex library re, use the string as a regex. For example, you can call'hello', 'hello world') to match the exact string 'hello' in the string 'hello world' and return a match object.

Here’s how you can match an exact substring in a given string:

>>> import re
>>>'hello', 'hello world')
<re.Match object; span=(0, 5), match='hello'>

After importing Python’s library for regular expression processing re, you use the, string) method to find the first occurrence of the pattern in the string.

๐Ÿ’ก Related Tutorial: If you’re unsure about the method, check out my detailed tutorial on this blog.

This returns a match object that wraps a lot of useful information such as the start and stop matching positions and the matching substring.

As you’re looking for exact string matches, the matching substring will always be the same as your searched word.

But wait, there’s another problem: you wanted an exact match, right?

Using the previous approach does not help because you’re getting prefix matches of your searched word:

>>>'good', 'goodbye')
<re.Match object; span=(0, 4), match='good'>

When searching for the exact word 'good' in the string 'goodbye' it actually matches the prefix of the word.

Is this what you wanted? If not, read on:

How to Match a Word in a String (Word Boundary \b)?

An exact match of a word will also retrieve matching substrings that occur anywhere in the string.

Here’s an example:

>>> 'no' in 'nobody knows'

And another example:

>>>'see', 'dfjkyldsssseels')
<re.Match object; span=(10, 13), match='see'>

What if you want to match only whole words—not exact substrings?

The answer is simple:

To match whole exact words, use the word boundary metacharacter '\b'. This metacharacter matches at the beginning and end of each word—but it doesn’t consume anything. In other words, it simply checks whether the word starts or ends at this position (by checking for whitespace or non-word characters).

Here’s how you use the word boundary character to ensure that only whole words match:

>>> import re
>>>'\bno\b', 'nobody knows')
>>>'\bno\b', 'nobody knows nothing - no?')
<re.Match object; span=(23, 25), match='no'>

In both examples, you use the same regex '\bno\b' that searches for the exact word 'no' but only if the word boundary character '\b' matches before and after.

In other words, the word 'no' must appear on its own as a separate word. It is not allowed to appear within another sequence of word characters.

As a result, the regex doesn’t match in the string 'nobody knows' but it matches in the string 'nobody knows nothing - no?'.

Note that we use raw string r'...' to write the regex so that the escape sequence '\b' works in the string.

  • Without the raw string, Python would assume that it’s an unescaped backslash character '\', followed by the character 'b'.
  • With the raw string, all backslashes will just be that: backslashes. The regex engine then interprets the two characters as one special metacharacter: the word boundary '\b'.

But what if you don’t care whether the word is uppercase, lowercase, or capitalized? In other words:

How to Match a Word in a String (Case Insensitive)?

You can search for an exact word in a string—but ignore capitalization. This way, it’ll be irrelevant whether the word’s characters are lowercase or uppercase.

Here’s how:

>>> import re
>>>'no', 'NONONON', flags=re.IGNORECASE)
<re.Match object; span=(0, 2), match='NO'>
>>>'no', 'NONONON', flags=re.I)
<re.Match object; span=(0, 2), match='NO'>
>>>'(?i)no', 'NONONON')
<re.Match object; span=(0, 2), match='NO'>

All three ways are equivalent: they all ignore the capitalization of the word’s letters.

๐Ÿ’ก Related Tutorial: If you need to learn more about the flags argument in Python, check out my detailed tutorial on this blog.

The third example uses the in-regex flag (?i) that also means: “ignore the capitalization”.

How to Find All Occurrences of a Word in a String?

Okay, you’re never satisfied, are you? So let’s explore how you can find all occurrences of a word in a string.

In the previous examples, you used the, string) method to find the first match of the pattern in the string.

Next, you’ll learn how to find all occurrences (not only the first match) by using the re.findall(pattern, string) method.

๐Ÿ’ก Related Tutorial: You can also read my blog tutorial about the findall() method that explains all the details.

>>> import re
>>> re.findall('no', 'nononono')
['no', 'no', 'no', 'no']

Your code retrieves all matching substrings.

If you need to find all match objects rather than matching substrings, you can use the re.finditer(pattern, string) method:

>>> for match in re.finditer('no', 'nonononono'):

<re.Match object; span=(0, 2), match='no'>
<re.Match object; span=(2, 4), match='no'>
<re.Match object; span=(4, 6), match='no'>
<re.Match object; span=(6, 8), match='no'>
<re.Match object; span=(8, 10), match='no'>

The re.finditer(pattern, string) method creates an iterator that iterates over all matches and returns the match objects. This way, you can find all matches and get the match objects as well.

How to Find All Lines Containing an Exact Word?

Say you want to find all lines that contain the word '42' from a multi-line string in Python. How’d you do it?

The answer makes use of a fine Python regex specialty: the dot regex matches all characters, except the newline character. Thus, the regex '.*' will match all characters in a given line (but then stop).

Here’s how you can use this fact to get all lines that contain a certain word:

>>> import re
>>> s = '''the answer is 42
the answer: 42
42 is the answer
43 is not'''
>>> re.findall('.*42.*', s)
['the answer is 42', 'the answer: 42', '42 is the answer']

Three out of four lines contain the word '42'. The findall() method returns these as strings.

How to Find All Lines Not Containing an Exact Word?

In the previous section, you’ve learned how to find all lines that contain an exact word.

In this section, you’ll learn how to do the opposite: find all lines that NOT contain an exact word.

This is a bit more complicated. I’ll show you the code first and explain it afterwards:

import re
s = '''the answer is 42
the answer: 42
42 is the answer
43 is not
the answer

for match in re.finditer('^((?!42).)*$', s, flags=re.M):

<re.Match object; span=(49, 58), match='43 is not'>
<re.Match object; span=(59, 69), match='the answer'>

You can see that the code successfully matches only the lines that do not contain the string '42'.

How can you do it?

The general idea is to match a line that doesn’t contain the string ‘42', print it to the shell, and move on to the next line. The re.finditer(pattern, string) accomplishes this easily by returning an iterator over all match objects.

The regex pattern '^((?!42).)*$' matches the whole line from the first position '^' to the last position '$'.

Related Tutorial: If you need a refresher on the start-of-the-line and end-of-the-line metacharacters, read this 5-min tutorial.

In between, you match an arbitrary number of characters: the asterisk quantifier does that for you.

Related Tutorial: If you need help understanding the asterisk quantifier, check out this blog tutorial.

Which characters do you match? Only those where you don’t have the negative word '42' in your lookahead.

Related Tutorial: If you need a refresher on lookaheads, check out this tutorial.

As the lookahead itself doesn’t consume a character, we need to consume it manually by adding the dot metacharacter . which matches all characters except the newline character '\n'.

Related Tutorial: As it turns out, there’s also a blog tutorial on the dot metacharacter.

Finally, you need to define the re.MULTILINE flag, in short: re.M, because it allows the start ^ and end $ metacharacters to match also at the start and end of each line (not only at the start and end of each string).

Together, this regular expression matches all lines that do not contain the specific word '42'.

Related article:

Where to Go From Here?

Summary: You’ve learned multiple ways of matching an exact word in a string.

  • You can use the simple Python membership operator.
  • You can use a default regex with no special metacharacters.
  • You can use the word boundary metacharacter '\b' to match only whole words.
  • You can match case-insensitive by using the flags argument re.IGNORECASE.
  • You can match not only one but all occurrences of a word in a string by using the re.findall() or re.finditer() methods.
  • And you can match all lines containing and not containing a certain word.

Pheww. This was some theory-heavy stuff. Do you feel like you need some more practical stuff next?

Then check out my practice-heavy Python freelancer course that helps you prepare for the worst and create a second income stream by creating your thriving coding side-business online.

๐Ÿ’ก Click: to learn more about the Finxter freelancer course and start your thriving coding business online (side income or full-time).

Also, you may enjoy this course on the Finxter computer science academy:

Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.ย ย 

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.ย 

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.ย ย Regular expressions โ€‹rule the game โ€‹when text processing โ€‹meets computer science.ย 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet:

Note all courses are available for free for Finxter Freelancer Course students!

Programmer Humor

Q: How do you tell an introverted computer scientist from an extroverted computer scientist?

A: An extroverted computer scientist looks at your shoes when he talks to you.