Python Regex Named Groups

Before we dive into named groups, let’s quickly recap normal capture groups in Python.

If you already know normal “unnamed capturing groups” well, feel free to skip the first section and move right away to the next one about named capture groups in Python. πŸ‘‡

What Are Python Regex Groups?

A normal Python regex group is a regex pattern enclosed by the parentheses characters '(' and ')' that group together expressions contained inside them and allow you to capture specific parts of a regex.

Groups are numbered and start from 0. This numbering allows you to access different matching subparts of the regex pattern.

The first group with index 0 is always the whole matched pattern:

import re


pattern = '(ab)(a(aa)a)'
text = 'abaaaa'

m = re.match(pattern, text)

print(m.group(0))
# abaaaa

Subsequent groups capture various parts of the pattern (think of the numbering going from left to right with each opening parentheses increasing the group counter by one):

print(m.group(1))
# ab

print(m.group(2))
# aaaa

Capture groups allow you to use helpful Python regex methods such as group(), span(), start(), and end() to gain access to different (meta) information about the matching pattern and where it occurs in the string.

🌍 Recommended Tutorial: Python Regex Groups

Okay, now that you know what normal capture groups are, let’s dive into named groups next! πŸ‘‡

What Is a Named Group in a Python Regular Expression?

A named group in Python works like a normal group in Python in that it allows you to group together a part of the regex pattern and access the matching part of the string using helper methods.

The difference between a named group and a normal group is that you assign a string identifier name to the group so you can access each group using a human-readable string identifier rather than a numerical identifier such as 0, 1, 2.

The motivation is that if you have only a couple of groups, you may remember their numerical identifier and it’s not too hard to count the group id. But if you have many groups, it becomes error prone and hard to read and understand to identify specific groups with a numerical value.

For unnamed groups, you’d have to count parentheses pairs from the left to right to get the group numerical id which can easily lead to errors.

To maximize readability of your regex programs, you can use named groups in the form of (?P<name>...), name being the string identifier associated with that particular named group.

Here’s an example:

Python Regex Named Group Example

For example, you match the name and income of a string 'Alice 97000' using two whitespace-separated named groups (?P<name>.*) and (?P<income>.*). You access them using the group('name') and group('income') string identifiers rather than the group(0) and group(1) integer identifiers.

import re
match = re.search('(?P<name>.*) (?P<income>.*)', 'Alice 97000')

print(match.group('name'))
# Alice

print(match.group('income'))
# 97000

Python Regex Named Group Multiple Matches

Say, you combine a named group with a quantifier such as * or + with the goal of matching that group multiple times in the regex:

import re


pattern = '(?P<two_a>aa)+'
text = 'aaaaaa'

m = re.match(pattern, text)

print(m.group('two_a'))
# aa

How would you access multiple matches of the named group? In the previous example, you’d obviously match only one.

To use named groups for multiple matches of the same group, drop the quantifier * or + after the group and use the re.findall() or re.finditer() method to get all matches of the pattern in the given string.

  • re.findall() — The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.
  • re.finditer() — You can create an iterable of all pattern matches in a text by using the re.finditer(pattern, text) method. Unlike re.findall(), it returns matching objects rather than matching strings.

Here’s the same example using this approach to find the start and stop indices of the named group matches:

import re


pattern = '(?P<two_a>aa)'
text = 'aaaaaa'

matches = re.finditer(pattern, text)

for m in matches:
    print(m.group('two_a'), m.span())
    

The output shows the multiple matches of the same named group pattern 'aa' in the string:

aa (0, 2)
aa (2, 4)
aa (4, 6)

Python Regex Optional Named Groups

πŸ’¬ Question: How to make a named capturing group in a Python regex optional?

Use the named group syntax (?P<name>...) with your name and append a question mark quantifier ? after it to make the whole named group match either zero or one time, i.e., make it optional using (?P<name>...)?. For example, (?P<age>.*)? makes the age named group optional.

Here’s an example where the named group age is made optional so we still match the string with the regex but the named group match is empty:

import re


pattern = '(?P<age>[0-9]+)?.*'

m = re.match(pattern, 'Alice no age')
print(m.group('age'))
# None

Where to Go From Here?

Thanks for reading through the whole regex tutorial. Feel free to check out my in-depth guide on regular expressions on the Finxter blog to keep improving your regex skills:

🌍 Recommended Tutorial: Python Regex Superpower [Full Guide]

Also, to download Python cheat sheets and continuously improve your coding skills, make sure to check out our email academy (100% free):