Python | Split String Except Quotes

ā­Summary: Use shlex.split(text) to split the given string using a delimiter except at quotes. You can strip away the remaining comma characters like so: [x.strip(',') for x in shlex.split(text)]

Minimal Example

import shlex
text = 'abc, xyz, "lm,no,pq", uvw'
print([x.strip(',') for x in shlex.split(text)])
# ['abc', 'xyz', 'lm,no,pq', 'uvw']

Problem Formulation

āš”Problem: Given a string in Python. How will you split the string using a certain delimiter except when the delimiter occurs within a substring that is present within quotes?

Example

# Input
text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"'
# Expected Output
['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']

Discussion: In the above problem, the string has been split at the occurrence of a comma. However, the salary of the substring "salary:$1,234,108" lies between double quotes and has comma-separated characters within it.

However, it is within double quotes. Hence the characters that come after the comma are not split. This is exactly what you need to achieve in this problem.

Method 1: Use re.split()

Use regex split to split the string accordingly. Let’s have a look at the code — it looks complicated but I’ll explain it in a minute!

import re
text = 'abc,xyz,"lm,no,pq",uvw'
print(re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text))
# ['abc', 'xyz', '"lm,no,pq"', 'uvw']

Explanation:

The code imports the re module and uses its split() function to split the given text string into a list, while preserving the content within the double quotes, even if it contains commas:

  1. import re: Import the regular expression module re.
  2. text = 'abc,xyz,"lm,no,pq",uvw': Define the input text string.
  3. re.split(r',\s*(?=(?:[^"]*"[^"]*")*[^"]*$)', text): The re.split() function splits the input text based on the provided regular expression pattern:
    • ,: Match a comma.
    • \s*: Match zero or more whitespace characters.
    • (?=(?:[^"]*"[^"]*")*[^"]*$): This is a positive lookahead assertion that ensures the comma being matched is not inside double quotes:
      • (?:[^"]*"[^"]*")*: This non-capturing group matches an even number of double quotes (i.e., complete pairs of double quotes).
      • [^"]*: Match zero or more characters that are not double quotes.
      • $: Assert the end of the string.

The resulting list, after splitting the text using the provided pattern, is ['abc', 'xyz', '"lm,no,pq"', 'uvw'].

āœ… Note: TheĀ re.split(pattern, string)Ā method matches all occurrences of theĀ patternĀ in theĀ stringĀ and divides the string along the matches resulting in a list of stringsĀ betweenĀ the matches. For example,Ā re.split('a', 'bbabbbab')Ā results in the list of stringsĀ ['bb', 'bbb', 'b'].

šŸŒŽRelated Read: Python Regex Split

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Method 2: Use shlex

Approach: Use shlex.split to split the string on the occurrence of a comma, except when the comma occurs within double quotes.

Code:

import shlex
text = 'name:Kelly, age:26, "salary:$1,234,108", "ID:1111"'
print([x.strip(',') for x in shlex.split(text)])
# ['name:Kelly', 'age:26', 'salary:$1,234,108', 'ID:1111']

Let’s see what the official documentation says about shelx.split

shlex.split(s,Ā comments=False,Ā posix=True) šŸ”† Split the stringĀ sĀ using shell-like syntax. IfĀ commentsĀ isĀ FalseĀ (the default), the parsing of comments in the given string will be disabled (setting theĀ commentersĀ attribute of theĀ shlexĀ instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if theĀ posixĀ argument is false.


I hope this article helped you and answered your queries. Please subscribe and stay tuned for more interesting solutions in the future.

šŸŒŽRecommended Read: Python Split String Double Quotes


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Ā Ā 

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Ā 

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Ā Ā Regular expressions ā€‹rule the game ā€‹when text processing ā€‹meets computer science.Ā 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: