shlex.split(text) to split the given string using a delimiter except at quotes. You can strip away the remaining comma characters like so:
[x.strip(',') for x in shlex.split(text)]
import shlex text = 'abc, xyz, "lm,no,pq", uvw' print([x.strip(',') for x in shlex.split(text)]) # ['abc', 'xyz', 'lm,no,pq', 'uvw']
⚡Problem: Given a string in Python. How will you split the string using a certain delimiter except when the delimiter occurs within a substring that is present within quotes?
# Input text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"' # Expected Output ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']
Discussion: In the above problem, the string has been split at the occurrence of a comma. However, the salary of the substring
"salary:$1,234,108" lies between double quotes and has comma-separated characters within it.
However, it is within double quotes. Hence the characters that come after the comma are not split. This is exactly what you need to achieve in this problem.
Method 1: Use re.split()
Use regex split to split the string accordingly. Let’s have a look at the code — it looks complicated but I’ll explain it in a minute!
import re text = 'abc,xyz,"lm,no,pq",uvw' print(re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text)) # ['abc', 'xyz', '"lm,no,pq"', 'uvw']
The code imports the
re module and uses its
split() function to split the given text string into a list, while preserving the content within the double quotes, even if it contains commas:
import re: Import the regular expression module
text = 'abc,xyz,"lm,no,pq",uvw': Define the input text string.
re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text): The
re.split()function splits the input text based on the provided regular expression pattern:
,: Match a comma.
\s*: Match zero or more whitespace characters.
(?=(?:[^"]*"[^"]*")*[^"]*$): This is a positive lookahead assertion that ensures the comma being matched is not inside double quotes:
(?:[^"]*"[^"]*")*: This non-capturing group matches an even number of double quotes (i.e., complete pairs of double quotes).
[^"]*: Match zero or more characters that are not double quotes.
$: Assert the end of the string.
The resulting list, after splitting the text using the provided pattern, is
['abc', 'xyz', '"lm,no,pq"', 'uvw'].
✅ Note: The
re.split(pattern, string) method matches all occurrences of the
pattern in the
string and divides the string along the matches resulting in a list of strings between the matches. For example,
re.split('a', 'bbabbbab') results in the list of strings
['bb', 'bbb', 'b'].
🌎Related Read: Python Regex Split
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Method 2: Use shlex
shlex.split to split the string on the occurrence of a comma, except when the comma occurs within double quotes.
import shlex text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"' print([x.strip(',') for x in shlex.split(text)]) # ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']
Let’s see what the official documentation says about shelx.split –
shlex.split(s, comments=False, posix=True) 🡆 Split the string s using shell-like syntax. If comments is
False (the default), the parsing of comments in the given string will be disabled (setting the
commenters attribute of the
shlex instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if the posix argument is false.
I hope this article helped you and answered your queries. Please subscribe and stay tuned for more interesting solutions in the future.
🌎Recommended Read: Python Split String Double Quotes
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: