Python Regex to Return String Between Parentheses

Problem Formulation

Given a string s. How to find the substring s' between an opening and a closing parentheses?

Consider the following examples:

Input:     'Learn Python (not C++)'
Output:  'not C++'

Input:     'function(a, b, c, d)'
Output:  'a, b, c, d'

Input:     '(a+(b+c))'
Output:  'a+(b+c)'

Method 1: Slicing and str.find()

The simplest way to extract the string between two parentheses is to use slicing and string.find(). First, find the indices of the first occurrences of the opening and closing parentheses. Second, use them as slice indices to get the substring between those indices like so: s[s.find('(')+1:s.find(')')].

Here’s a straightforward example:

s = 'Learn Python (not C++)'
result = s[s.find('(')+1:s.find(')')]
print(result)

The result is the string:

'not C++'

The start index of the slicing operation is incremented by one to avoid including the opening parenthesis in the resulting string. If you need a quick refresher on slicing, feel free to watch the following explainer video:

Method 2: Slicing and rfind()

Alternatively, you can also use the string.rfind() method to search for the closing parentheses from the right instead of the left to create more meaningful outputs for nested parentheses.

s = '(Learn Python (not C++))'

print(s[s.find('(')+1:s.find(')')])
# Learn Python (not C++

print(s[s.find('(')+1:s.rfind(')')])
# Learn Python (not C++)

If the closing parentheses don’t exist, the output of the string.find() method is -1 which means that it slices all the way to the right but excluding the last character of the string.

This is exemplified here:

s = 'Learn Python (not C++'
result = s[s.find('(')+1:s.find(')')]
print(result)

Clearly, this is not the goal of the operation. So, can we do better? And can we find all occurrences in case there are multiple such strings?

Yes. Regex to the rescue!

Method 3: Find All Occurrences with re.findall()

To find all strings between two parentheses, call the re.findall() function and pass the pattern '\(.*?\)' as a first argument and the string to be searched as a second argument.

  • The .*? part matches an arbitrary number of characters but is non-greedy to not also match other parentheses.
  • The '\( ... \)' part matches the opening and closing parentheses. You need to escape the parentheses characters to tell the regex engine that you don’t want it to assume it’s a regex group operation that also starts with parentheses.
import re
s = '(Learn Python) (not C++)'
result = re.findall('\(.*?\)', s)
print(result)

The output is the list of matches:

['(Learn Python)', '(not C++)']

You can watch the following video explaining the re.findall() function:

But what if you have nested parentheses in the string '(Learn Python (not C++))'? In this case, it doesn’t work anymore because the whole text between the outermost parentheses will match the pattern '\(.*?\)'.

import re
s = '(Learn Python (not C++))'
result = re.findall('\(.*?\)', s)
print(result)

Let’s examine a more advanced solution I came up with.

Method 4: Find All Occurrences in Strings with Nested Parentheses

To find all occurrences even in a string with nested parentheses, you can consecutively search all substrings starting from a given start index in a for loop:

import re
s = '(Learn Python (not C++))'
results = set()
for start in range(len(s)):
    string = s[start:]
    results.update(re.findall('\(.*?\)', string))
print(results)
# {'(Learn Python (not C++)', '(not C++)'}

This performs the following steps:

  • Create an empty set to merge all matching strings into it but avoid duplicates.
  • Iterate over all start indices from 0 to the length of the string to be searched, minus one.
  • Create a substring using slicing s[start:] to be searched for enclosing parentheses.
  • Find the next strings enclosed in parentheses using re.findall('\(.*?\', string) and add them to the set.

Summary

The simplest way to extract the string between two parentheses is to use slicing and string.find(). First, find the indices of the first occurrences of the opening and closing parentheses. Second, use them as slice indices to get the substring between those indices like so: s[s.find('(')+1:s.find(')')].

Alternatively, you can also use the string.rfind() method to search for the closing parentheses from the right instead of the left to create more meaningful outputs for nested parentheses:

To find all strings between two parentheses, call the re.findall() function and pass the pattern '\(.*?\)' as a first argument and the string to be searched as a second argument.

To find all occurrences even in a string with nested parentheses, you can consecutively search all substrings starting from a given start index in a for loop.

Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.  

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.  Regular expressions ​rule the game ​when text processing ​meets computer science. 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: