How to Find All Matches using Regex

Problem Formulation and Solution Overview

In this article, you’ll learn how to find all matches in a string using regex.

The Regular Expression, also referred to as `regex`, is a complex pattern to search for and locate matching character(s) within a string. At first, this concept may seem daunting, but with practice, regex will improve your coding skills dramatically.

To make it more fun, we will find all matches for the word John in the paragraph below (a snippet from Elton John’s biography).

π¬ Question: How would we write code to find all matches using a Regular Expression (regex) in Python?

We can accomplish this task by one of the following options:

Preparation

To run these code examples error-free, the regex library must be installed and imported. Click here for installation instructions.

```import re
# or import regex```

Method 1: Use regex findall()

The `re.findall()` function can be found in the `regex` library. This function searches for matching patterns in a string and has the following syntax: `re.findall(pattern, string, flags=0)`

```import re

elton_bio = """
Born Reginald Kenneth Dwight on 25 March 1947,
John is a British singer, pianist and composer.
John is commonly nicknamed Rocket Man after his
hit of the same name. JoHn has led a successful
career as a solo artist since the 1970s.
"""

matches = re.findall(r'J\w+', elton_bio, re.IGNORECASE | re.MULTILINE)
print(matches)```

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to `elton_bio`.

Next, `re.findall() `is called and passed the following arguments:

• The search pattern (`r'J\w+'`). The `r` indicates to treat the string as a raw string (ignore all escape codes).
• The string to search on `elton_bio`.
• Two (2) regex flags. The first flag ignores the case (such as upper, lower, title). The second flag accommodates the multi-line string,

The results return as a list and save to `matches`.

π‘Note: When calling more than one (1) flag, separate with the pipe (|) character.

When the output is sent to the terminal, three (3) matches are found. If `re.IGNORECASE`, or `re.I` was not passed as an argument; the last element would not be considered a match.

π‘Note: Regex flags have short-forms, such as:
`re.I` is the same as `re.IGNORECASE`, `re.M` is the same as `re.MULTIlINE`.

Method 2: Use regex finditer()

This method uses `re.finditer()` from the `regex` library. This option may be best if a large number of matches is expected as it returns an iterator object instead of a list.

```import re

elton_bio = """
Born Reginald Kenneth Dwight on 25 March 1947,
John is a British singer, pianist and composer.
John is commonly nicknamed Rocket Man after his
hit of the same name. JoHn has led a successful
career as a solo artist since the 1970s.
"""

result = re.finditer(r'J\w+', elton_bio)

for match in result:
print(match.group())```

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to `elton_bio`.

Then `re.finditer() `is called and passed two (2) arguments:

• The search pattern (`r'J\w+'`). The `r` indicates to treat the string as a raw string (ignore all escape codes).
• The multi-line string to search on `elton_bio`.

An object returns and saves to `result`. If `result` was output to the terminal, an object similar to below would display.

To view the matches, a `for` loop is called to output each `match.group()` found to the terminal.

π‘Note: The output displays all three (3) matches, even though the last match is in mixed cased.

Method 3: Use regex.search()

This method uses `re.search()` to search for matches and return a list.

```import re

elton_bio = """
Born Reginald Kenneth Dwight on 25 March 1947,
John is a British singer, pianist and composer.
John is commonly nicknamed Rocket Man after his
hit of the same name. JoHn has led a successful
career as a solo artist since the 1970s.
"""

def find_all(regex, text):
match_list = []
while True:
match  = re.search(regex, text)
if match:
match_list.append(match.group(0))
text = text[match.end():]
else:
return match_list

print(find_all(r'J\w+', elton_bio))```

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to `elton_bio`.

Next, the function `find_all` is defined with two (2) arguments: the regex pattern (`regex`) and the string to search (`text`).

The following lines loop through the string, searching for pattern matches. These matches are extracted and appended to `match_list`.

Finally, the above function is called and passed the appropriate arguments. The results return and are output to the terminal.

π‘Note: The output displays all three (3) matches, even though the last match is in mixed cased.

Method 4: Use regex sub()

What happens if you want to extract each occurrence of ‘John’ and replace it with ‘Elton John’? You could use `regex.sub()` with the following syntax:
`re.sub(pattern, replacement, string[, count, flags])`

```import re

elton_bio = """
Born Reginald Kenneth Dwight on 25 March 1947,
John is a British singer, pianist and composer.
John is commonly nicknamed Rocket Man after his
hit of the same name. JoHn has led a successful
career as a solo artist since the 1970s.
"""

new_ebio = re.sub(r'J\w+', 'Elton John', elton_bio)
print(new_ebio)```

Above imports the regex library.

Then a multi-line string is declared containing a snippet of Elton John’s Biography. This saves to `elton_bio`.

The following line calls `re.sub()` with three (3) arguments:

• The search pattern (`r'J\w+'`). The `r` indicates to treat the string as a raw string (ignore all escape codes).
• The replacement string ‘`Elton John`‘.
• The multi-line string to apply this on `elton_bio`.

The results save to `new_ebio` and are output to the terminal.

Summary

These methods of finding all matches using regex should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!