How to Find All Matches using Regex

5/5 - (4 votes)

Problem Formulation and Solution Overview

In this article, you’ll learn how to find all matches in a string using regex.

The Regular Expression, also referred to as regex, is a complex pattern to search for and locate matching character(s) within a string. At first, this concept may seem daunting, but with practice, regex will improve your coding skills dramatically.

To make it more fun, we will find all matches for the word John in the paragraph below (a snippet from Elton John’s biography).

Born Reginald Kenneth Dwight on 25 March 1947, John is a British singer, pianist and composer. John is commonly nicknamed Rocket Man after his hit of the same name. John has led a successful career as a solo artist since the 1970s.

πŸ’¬ Question: How would we write code to find all matches using a Regular Expression (regex) in Python?

We can accomplish this task by one of the following options:


Preparation

To run these code examples error-free, the regex library must be installed and imported. Click here for installation instructions.

import re
# or import regex

Method 1: Use regex findall()

The re.findall() function can be found in the regex library. This function searches for matching patterns in a string and has the following syntax: re.findall(pattern, string, flags=0)

import re

elton_bio = """
        Born Reginald Kenneth Dwight on 25 March 1947, 
        John is a British singer, pianist and composer. 
        John is commonly nicknamed Rocket Man after his 
        hit of the same name. JoHn has led a successful  
        career as a solo artist since the 1970s. 
"""

matches = re.findall(r'J\w+', elton_bio, re.IGNORECASE | re.MULTILINE)
print(matches)

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to elton_bio.

Next, re.findall() is called and passed the following arguments:

  • The search pattern (r'J\w+'). The r indicates to treat the string as a raw string (ignore all escape codes).
  • The string to search on elton_bio.
  • Two (2) regex flags. The first flag ignores the case (such as upper, lower, title). The second flag accommodates the multi-line string,

The results return as a list and save to matches.

πŸ’‘Note: When calling more than one (1) flag, separate with the pipe (|) character.

When the output is sent to the terminal, three (3) matches are found. If re.IGNORECASE, or re.I was not passed as an argument; the last element would not be considered a match.

['John', 'John', 'JoHn']

πŸ’‘Note: Regex flags have short-forms, such as:
re.I is the same as re.IGNORECASE, re.M is the same as re.MULTIlINE.

Python Regex Flags [Ultimate Guide]

Method 2: Use regex finditer()

This method uses re.finditer() from the regex library. This option may be best if a large number of matches is expected as it returns an iterator object instead of a list.

import re

elton_bio = """
        Born Reginald Kenneth Dwight on 25 March 1947, 
        John is a British singer, pianist and composer. 
        John is commonly nicknamed Rocket Man after his 
        hit of the same name. JoHn has led a successful  
        career as a solo artist since the 1970s. 
"""

result = re.finditer(r'J\w+', elton_bio)

for match in result:
    print(match.group())

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to elton_bio.

Then re.finditer() is called and passed two (2) arguments:

  • The search pattern (r'J\w+'). The r indicates to treat the string as a raw string (ignore all escape codes).
  • The multi-line string to search on elton_bio.

An object returns and saves to result. If result was output to the terminal, an object similar to below would display.

<callable_iterator object at 0x0000021F3CB2B430>

To view the matches, a for loop is called to output each match.group() found to the terminal.

John
John
JoHn

πŸ’‘Note: The output displays all three (3) matches, even though the last match is in mixed cased.


Method 3: Use regex.search()

This method uses re.search() to search for matches and return a list.

import re

elton_bio = """
        Born Reginald Kenneth Dwight on 25 March 1947, 
        John is a British singer, pianist and composer. 
        John is commonly nicknamed Rocket Man after his 
        hit of the same name. JoHn has led a successful  
        career as a solo artist since the 1970s. 
"""

def find_all(regex, text):
    match_list = []
    while True:
        match  = re.search(regex, text)
        if match:
            match_list.append(match.group(0))
            text = text[match.end():]
        else:
            return match_list

print(find_all(r'J\w+', elton_bio))

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to elton_bio.

Next, the function find_all is defined with two (2) arguments: the regex pattern (regex) and the string to search (text).

The following lines loop through the string, searching for pattern matches. These matches are extracted and appended to match_list.

Finally, the above function is called and passed the appropriate arguments. The results return and are output to the terminal.

['John', 'John', 'JoHn']

πŸ’‘Note: The output displays all three (3) matches, even though the last match is in mixed cased.


Method 4: Use regex sub()

What happens if you want to extract each occurrence of ‘John’ and replace it with ‘Elton John’? You could use regex.sub() with the following syntax:
re.sub(pattern, replacement, string[, count, flags])

import re

elton_bio = """
        Born Reginald Kenneth Dwight on 25 March 1947, 
        John is a British singer, pianist and composer. 
        John is commonly nicknamed Rocket Man after his 
        hit of the same name. JoHn has led a successful  
        career as a solo artist since the 1970s. 
"""

new_ebio = re.sub(r'J\w+', 'Elton John', elton_bio)
print(new_ebio)

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to elton_bio.

The following line calls re.sub() with three (3) arguments:

  • The search pattern (r'J\w+'). The r indicates to treat the string as a raw string (ignore all escape codes).
  • The replacement string ‘Elton John‘.
  • The multi-line string to apply this on elton_bio.

The results save to new_ebio and are output to the terminal.

Born Reginald Kenneth Dwight on 25 March 1947, Elton John is a British singer, pianist and composer. Elton John is commonly nicknamed Rocket Man after his hit of the same name. Elton John has led a successful career as a solo artist since the 1970s.

Summary

These methods of finding all matches using regex should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!


Regex Humor

Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee. (source)