Python | Split String Except Quotes

3.3/5 - (3 votes)

Summary: Use re.split(r',(?=")' to split the given string using a delimiter except at quotes. Another solution is to use the shlex package.

Minimal Example

# Given string
text = 'abc, xyz, "lm,no,pq", uvw'

# Method 1
import re

print(re.split(r',(?=")', text))
# ['abc, xyz, "lm,no,pq", uvw']

# Method 2
import shlex

print([x.strip(',') for x in shlex.split(text)])
# ['abc, xyz, "lm,no,pq", uvw']

Problem Formulation

Problem: Given a string in Python. How will you split the string using a certain delimiter except when the delimiter occurs within a substring that is present within quotes?

Example

# Input
text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"'
# Expected Output
['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']

Discussion: In the above problem, the string has been split at the occurrence of a comma. However, the salary the substring "salary:$1,234,108" lies between double quotes and has comma-separated characters within it. However, it is within double quotes. Hence the characters that come after the comma are not split. This is exactly what you need to achieve in this problem.

Method 1: Use re.split

Use regex split to split the string accordingly. It is important to understand the expression used to split the given string. Let’s have a look at the code and then we will understand how it works.

Code:

import re
text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"'
print(re.split(r',(?=")', text))

# OUTPUT: ['name:Kelly, age:26, "salary:$1,234,108", "ID:1111"']

Explanation: Let’s break down the regex used in the above solution. “,” is the item to be matched during the split. (?=") is a negative lookahead assertion which means only match if not followed by the double quotes character. The re.split() function will split the given string only on the occurrence of a match. Hence, it will only split on the occurrence of non-quoted commas.

Note: The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

🌎Related Read: Python Regex Split

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Method 2: Use shlex

Approach: Use shlex.split to split the string on the occurrence of a comma, except when the comma occurs within double quotes.

Code:

import shlex
text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"'
print([x.strip(',') for x in shlex.split(text)])

# ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']

Let’s see what the official documentation says about shelx.split

shlex.split(scomments=Falseposix=True) 🡆 Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if the posix argument is false.

Conclusion

Therefore, we learned two ways of splitting a string at a delimiter except for quotes in this article. The first way used a simple regex split while in the next approach, we used the shlex package. I hope this article helped you and answered your queries. Please subscribe and stay tuned for more interesting solutions in the future.

🌎Recommended Read: Python Split String Double Quotes


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.  

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.  Regular expressions ​rule the game ​when text processing ​meets computer science. 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: