πSummary: You can split a string at parenthesis using re.split(r'[()]', text)
in a list comprehension accordingly.
Minimal Example
Problem Formulation
πProblem: Given a string. How will you split the string at parenthesis and spaces?
Example
# Input text = "abc xyz (ABC) (LMN) aabbcc" # OUTPUT ['abc', 'xyz', 'ABC', 'LMN', 'aabbcc']
In the above problem, you have been given a string separated by spaces and there are certain strings that are within parenthesis. How will you achieve the expected output?
Method 1: Using re.split
Prerequisite: The re.split(pattern, string)
method matches all occurrences of the pattern
in the string
and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab')
results in the list of strings ['bb', 'bbb', 'b']
.
πͺRelated Read: Python Regex Split
Approach: Use re.split(r'[()]', text)
to split the string on the occurrence of a parenthesis. Here, the pattern [()]
means whenever the script finds any parenthesis character it splits the string. Now, you will have a string containing substrings split at parenthesis. Your next task is to split the string at every occurrence of a whitespace. You can do that with the help of a simple split()
function upon the previously computed value. However, this yields a list of lists that you can flatten using another list comprehension.
code
import re text = "abc xyz (ABC) (LMN) aabbcc" extract_parenthesis = [x for x in re.split(r'[()]', text) if x.strip()] nested_result = [y.split() for y in extract_parenthesis] res = [item for i in nested_result for item in i] print(res) # ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']
Explanation:
πͺRecommended Read: Flatten A List Of Lists In Python
Method 2: Using re.sub
Prerequisite: The regex function re.sub(P, R, S)
replaces all occurrences of the pattern P
with the replacement R
in string S
. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb')
, the result will be the new string 'bbbb'
with all characters 'a'
replaced by 'b'
.
πͺRelated Read: Python Regex Sub
Approach: The idea here is to substitute every occurrence of a parenthesis with an empty string. The entire string can then be split using the split()
function.
Code
import re text = "abc xyz (ABC) (LMN) aabbcc" text_r = re.sub(r'[()]', "", text).split() print(text_r) # ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']
Method 3: Using re.findall
Prerequisite: The re.findall(pattern, string)
method scans string
from left to right, searching for all non-overlapping matches of the pattern
. It returns a list of strings in the matching order when scanning the string from left to right.
πͺRelated Read: Python re.findall() β Everything You Need to Know
Approach: Find all the items separated by parenthesis as well as whitespaces using the re.findall
function. Use the “or”, i.e. “|” meta character to take care of both the conditions (strings within parenthesis and strings separated by whitespaces) at the same time.
Code
import re text = "abc xyz (ABC) (LMN) aabbcc" regex = r'\(.+?\)|".+?"|\w+' result = re.findall(regex, text) print(result) # ['abc', 'xyz', '(ABC)', '(LMN)', 'aabbcc']
Conclusion
Hurrah! We have successfully split the given string at parenthesis and spaces. I hope this discussion was helpful and it answered all your queries. Please stay tuned and subscribe for more interesting reads and discussions.
Happy coding!
πͺRecommended Read: Python Regex to Return String Between Parentheses
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Β Β
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Β
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Β Β Regular expressions βrule the game βwhen text processing βmeets computer science.Β
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: