Before we dive into named groups, let’s quickly recap normal capture groups in Python.
If you already know normal “unnamed capturing groups” well, feel free to skip the first section and move right away to the next one about named capture groups in Python. 👇
What Are Python Regex Groups?
A normal Python regex group is a regex pattern enclosed by the parentheses characters
')' that group together expressions contained inside them and allow you to capture specific parts of a regex.
Groups are numbered and start from 0. This numbering allows you to access different matching subparts of the regex pattern.
The first group with index 0 is always the whole matched pattern:
import re pattern = '(ab)(a(aa)a)' text = 'abaaaa' m = re.match(pattern, text) print(m.group(0)) # abaaaa
Subsequent groups capture various parts of the pattern (think of the numbering going from left to right with each opening parentheses increasing the group counter by one):
print(m.group(1)) # ab print(m.group(2)) # aaaa
Capture groups allow you to use helpful Python regex methods such as
end() to gain access to different (meta) information about the matching pattern and where it occurs in the string.
🌍 Recommended Tutorial: Python Regex Groups
Okay, now that you know what normal capture groups are, let’s dive into named groups next! 👇
What Is a Named Group in a Python Regular Expression?
A named group in Python works like a normal group in Python in that it allows you to group together a part of the regex pattern and access the matching part of the string using helper methods.
The difference between a named group and a normal group is that you assign a string identifier name to the group so you can access each group using a human-readable string identifier rather than a numerical identifier such as 0, 1, 2.
The motivation is that if you have only a couple of groups, you may remember their numerical identifier and it’s not too hard to count the group id. But if you have many groups, it becomes error prone and hard to read and understand to identify specific groups with a numerical value.
For unnamed groups, you’d have to count parentheses pairs from the left to right to get the group numerical id which can easily lead to errors.
To maximize readability of your regex programs, you can use named groups in the form of
name being the string identifier associated with that particular named group.
Here’s an example:
Python Regex Named Group Example
For example, you match the name and income of a string
'Alice 97000' using two whitespace-separated named groups
(?P<income>.*). You access them using the
group('income') string identifiers rather than the
group(1) integer identifiers.
import re match = re.search('(?P<name>.*) (?P<income>.*)', 'Alice 97000') print(match.group('name')) # Alice print(match.group('income')) # 97000
Python Regex Named Group Multiple Matches
Say, you combine a named group with a quantifier such as
+ with the goal of matching that group multiple times in the regex:
import re pattern = '(?P<two_a>aa)+' text = 'aaaaaa' m = re.match(pattern, text) print(m.group('two_a')) # aa
How would you access multiple matches of the named group? In the previous example, you’d obviously match only one.
To use named groups for multiple matches of the same group, drop the quantifier
+ after the group and use the
re.finditer() method to get all matches of the pattern in the given string.
re.findall(pattern, string)method scans
stringfrom left to right, searching for all non-overlapping matches of the
pattern. It returns a list of strings in the matching order when scanning the string from left to right.
re.finditer()— You can create an iterable of all
patternmatches in a
textby using the
re.finditer(pattern, text)method. Unlike
re.findall(), it returns matching objects rather than matching strings.
Here’s the same example using this approach to find the start and stop indices of the named group matches:
import re pattern = '(?P<two_a>aa)' text = 'aaaaaa' matches = re.finditer(pattern, text) for m in matches: print(m.group('two_a'), m.span())
The output shows the multiple matches of the same named group pattern
'aa' in the string:
aa (0, 2) aa (2, 4) aa (4, 6)
Python Regex Optional Named Groups
💬 Question: How to make a named capturing group in a Python regex optional?
Use the named group syntax
(?P<name>...) with your
name and append a question mark quantifier
? after it to make the whole named group match either zero or one time, i.e., make it optional using
(?P<name>...)?. For example,
(?P<age>.*)? makes the
age named group optional.
Here’s an example where the named group
age is made optional so we still match the string with the regex but the named group match is empty:
import re pattern = '(?P<age>[0-9]+)?.*' m = re.match(pattern, 'Alice no age') print(m.group('age')) # None
Where to Go From Here?
Thanks for reading through the whole regex tutorial. Feel free to check out my in-depth guide on regular expressions on the Finxter blog to keep improving your regex skills:
🌍 Recommended Tutorial: Python Regex Superpower [Full Guide]
Also, to download Python cheat sheets and continuously improve your coding skills, make sure to check out our email academy (100% free):
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.