Python | Split String Parenthesis

🍎Summary: You can split a string at parenthesis using re.split(r'[()]', text) in a list comprehension accordingly.

Minimal Example

Problem Formulation

πŸ“œProblem: Given a string. How will you split the string at parenthesis and spaces?

Example

# Input
text = "abc xyz (ABC) (LMN) aabbcc"

# OUTPUT
['abc', 'xyz', 'ABC', 'LMN', 'aabbcc']

In the above problem, you have been given a string separated by spaces and there are certain strings that are within parenthesis. How will you achieve the expected output?


Method 1: Using re.split

Prerequisite: The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

πŸͺRelated Read: Python Regex Split

Approach: Use re.split(r'[()]', text) to split the string on the occurrence of a parenthesis. Here, the pattern [()] means whenever the script finds any parenthesis character it splits the string. Now, you will have a string containing substrings split at parenthesis. Your next task is to split the string at every occurrence of a whitespace. You can do that with the help of a simple split() function upon the previously computed value. However, this yields a list of lists that you can flatten using another list comprehension.

code

import re

text = "abc xyz (ABC) (LMN) aabbcc"
extract_parenthesis = [x for x in re.split(r'[()]', text) if x.strip()]
nested_result = [y.split() for y in extract_parenthesis]
res = [item for i in nested_result for item in i]
print(res)

# ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']

Explanation:

πŸͺRecommended Read: Flatten A List Of Lists In Python

Method 2: Using re.sub

Prerequisite: The regex function re.sub(P, R, S) replaces all occurrences of the pattern P with the replacement R in string S. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb'), the result will be the new string 'bbbb' with all characters 'a' replaced by 'b'.

πŸͺRelated Read: Python Regex Sub

Approach: The idea here is to substitute every occurrence of a parenthesis with an empty string. The entire string can then be split using the split() function.

Code

import re

text = "abc xyz (ABC) (LMN) aabbcc"
text_r = re.sub(r'[()]', "", text).split()
print(text_r)

# ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']

Method 3: Using re.findall

Prerequisite: The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

πŸͺRelated Read: Python re.findall() – Everything You Need to Know

Approach: Find all the items separated by parenthesis as well as whitespaces using the re.findall function. Use the “or”, i.e. “|” meta character to take care of both the conditions (strings within parenthesis and strings separated by whitespaces) at the same time.

Code

import re

text = "abc xyz (ABC) (LMN) aabbcc"
regex = r'\(.+?\)|".+?"|\w+'
result = re.findall(regex, text)
print(result)

# ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']

Conclusion

Hurrah! We have successfully split the given string at parenthesis and spaces. I hope this discussion was helpful and it answered all your queries. Please stay tuned and subscribe for more interesting reads and discussions.

Happy coding!

πŸͺRecommended Read: Python Regex to Return String Between Parentheses


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Β Β 

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Β 

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Β Β Regular expressions ​rule the game ​when text processing ​meets computer science.Β 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: