Python Regex Finditer()

5/5 - (1 vote)

You can create an iterable of all pattern matches in a text by using the re.finditer(pattern, text) method:

Specification: re.finditer(pattern, text, flags=0)

Definition: returns an iterator that goes over all non-overlapping matches of the pattern in the text.

The flags argument allows you to customize some advanced properties of the regex engine such as whether capitalization of characters should be ignored. You can learn more about the flags argument in my detailed blog tutorial.

Example: You can use the iterator to count the number of matches. In contrast to the re.findall() method described above, this has the advantage that you can analyze the match objects themselves that carry much more information than just the matching substring.

import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world'
for match in re.finditer(pattern, text):
   print(match)

'''
<re.Match object; span=(0, 6), match='python'>
<re.Match object; span=(7, 9), match='is'>
<re.Match object; span=(10, 13), match='the'>
<re.Match object; span=(14, 18), match='best'>
<re.Match object; span=(19, 30), match='programming'>
<re.Match object; span=(31, 39), match='language'>
<re.Match object; span=(40, 42), match='in'>
<re.Match object; span=(43, 46), match='the'>
<re.Match object; span=(47, 52), match='world'>
'''

If you want to count the number of matches, you can use a count variable:

import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world'

count = 0
for match in re.finditer(pattern, text):
   count += 1

print(count)
# 9

Or a more Pythonic solution:

import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world'

print(len([i for i in re.finditer(pattern, text)]))
# 9

This method works great if there are non-overlapping matches.