āSummary: Use shlex.split(text)
to split the given string using a delimiter except at quotes. You can strip away the remaining comma characters like so: [x.strip(',') for x in shlex.split(text)]
Minimal Example
import shlex text = 'abc, xyz, "lm,no,pq", uvw' print([x.strip(',') for x in shlex.split(text)]) # ['abc', 'xyz', 'lm,no,pq', 'uvw']
Problem Formulation
ā”Problem: Given a string in Python. How will you split the string using a certain delimiter except when the delimiter occurs within a substring that is present within quotes?
Example
# Input text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"' # Expected Output ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']
Discussion: In the above problem, the string has been split at the occurrence of a comma. However, the salary of the substring "salary:$1,234,108"
lies between double quotes and has comma-separated characters within it.
However, it is within double quotes. Hence the characters that come after the comma are not split. This is exactly what you need to achieve in this problem.
Method 1: Use re.split()
Use regex split to split the string accordingly. Let’s have a look at the code — it looks complicated but I’ll explain it in a minute!
import re text = 'abc,xyz,"lm,no,pq",uvw' print(re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text)) # ['abc', 'xyz', '"lm,no,pq"', 'uvw']
Explanation:
The code imports the re
module and uses its split()
function to split the given text string into a list, while preserving the content within the double quotes, even if it contains commas:
import re
: Import the regular expression modulere
.text = 'abc,xyz,"lm,no,pq",uvw'
: Define the input text string.re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text)
: There.split()
function splits the input text based on the provided regular expression pattern:,
: Match a comma.\s*
: Match zero or more whitespace characters.(?=(?:[^"]*"[^"]*")*[^"]*$)
: This is a positive lookahead assertion that ensures the comma being matched is not inside double quotes:(?:[^"]*"[^"]*")*
: This non-capturing group matches an even number of double quotes (i.e., complete pairs of double quotes).[^"]*
: Match zero or more characters that are not double quotes.$
: Assert the end of the string.
The resulting list, after splitting the text using the provided pattern, is ['abc', 'xyz', '"lm,no,pq"', 'uvw']
.
ā
Note: TheĀ re.split(pattern, string)
Ā method matches all occurrences of theĀ pattern
Ā in theĀ string
Ā and divides the string along the matches resulting in a list of stringsĀ betweenĀ the matches. For example,Ā re.split('a', 'bbabbbab')
Ā results in the list of stringsĀ ['bb', 'bbb', 'b']
.
šRelated Read: Python Regex Split
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Method 2: Use shlex
Approach: Use shlex.split
to split the string on the occurrence of a comma, except when the comma occurs within double quotes.
Code:
import shlex text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"' print([x.strip(',') for x in shlex.split(text)]) # ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']
Let’s see what the official documentation says about shelx.split –
shlex.split(s,Ā comments=False,Ā posix=True) š” Split the stringĀ sĀ using shell-like syntax. IfĀ commentsĀ isĀ False
Ā (the default), the parsing of comments in the given string will be disabled (setting theĀ commenters
Ā attribute of theĀ shlex
Ā instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if theĀ posixĀ argument is false.
I hope this article helped you and answered your queries. Please subscribe and stay tuned for more interesting solutions in the future.
šRecommended Read: Python Split String Double Quotes
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Ā Ā
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Ā
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Ā Ā Regular expressions ārule the game āwhen text processing āmeets computer science.Ā
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: