Python | Split String by Multiple Characters/Delimiters

5/5 - (3 votes)

Summary: The most efficient way to split a string using multiple characters is to use Python’s regex library as re.split("pattern", "given_string"). An alternate solution is to replace the delimiters in the given string with a whitespace character and then split the string.

Minimal Example:

import re

text = "abc!lmn pqr xyz@mno"
# Method 1
res = re.split("\W+", text)
print(res)
# OUTPUT: ['abc', 'lmn', 'pqr', 'xyz', 'mno']


# Method 2
text = "one1two2three"
print(re.split("[ 1| 2]", text))
# OUTPUT: ['one', 'two', 'three']

# Method 3
for i in text:
    if i in ['1', '2']:
        text = text.replace(i, ' ')
print(text.split())
# OUTPUT: ['one', 'two', 'three']

Problem Formulation

πŸ“œProblem: Given a string. How will you split the string using multiple characters/delimiters?

You might have created a list of split strings using a given separator. But when it comes to splitting the string with multiple delimiters, it might be a little confusing and complex.

Examples

Let’s have a look at a couple of examples that require you to split a given string on the occurrence of multiple separators.

Example 1: Split string using multiple non-alphanumeric separators:

# Input
text = "Welcome!Finxter-Master the art@Python"
sep = ['!', '', '-', ' ', '@']
# Output
['Welcome', 'Finxter', 'Master', 'the', 'art', 'Python']

Example 2: Split string using multiple characters (alphanumeric):

# Input
text = "abcZlmnDpqrsghu@jil.org"
sep = ['Z', 'D', 's', '@', '.']
# Output
['abc', 'lmn', 'pqr', 'ghu', 'jil', 'org']

So, we have two different scenarios – (1) The delimiters are not alphanumeric, (2) The delimiters are a mixed blend of alphanumeric characters and non-alphanumeric characters.


There are numerous ways of solving the given problem. So, without further ado, let us dive into the solutions.

Method 1: Using Regex

The best way to deal with multiple delimiters is to use the flexibility of the regular expressions library. There are different functions available in the regex library that you can use to split the given string. Let’s go through each one by one.

Using re.split

The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

🌎Recommended Read:  Python Regex Split.

Code:

import re

# Solution to Example 1
text = "Welcome!Finxter-Master the art@Python"
res = re.split("\W+", text)
print(res)
# OUTPUT: ['Welcome', 'Finxter3', 'Master', 'the', 'art', 'Python']


# Solution to Example 2
text = "abcZlmnDpqrsghu@jil.org"
sep = ['Z', 'D', 's', '@', '.']
res = re.split("[ Z | D | s | @ | .]", text)
print(res)
# OUTPUT: ['abc', 'lmn', 'pqr', 'ghu', 'jil', 'org']

Explanation:

  • To split the string using multiple non-alphanumeric characters use re.split("\W+", text) where \W is the matching pattern and it represents a special sequence that returns a match where it does not find any word characters in the given string. Thus, whenever the script finds any character that is not alphanumeric, it splits the string.
  • To split the string using alphanumeric separators, use the either-or (|) metacharacter within your string. It allows you to specify each separator within the expression like so: re.split("[ Z,| D | s | @ | .]", text). Thus, whenever the script encounters any of the mentioned characters specified within the pattern, it will split the given string.

Using re.findall

The re.findall(pattern, string) method scans the string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

🌎Recommended Read: Python re.findall() – Everything You Need to Know

Code:

import re

text = "Welcome!Finxter-Master the art@Python"
sep = ['!', '', '-', ' ', '@']
res = re.findall(r"[\w']+", text)
print(res)
# OUTPUT: ['Welcome', 'Finxter', 'Master', 'the', 'art', 'Python']

text = "abcZlmnDpqrsghu@jil.org"
sep = ['Z', 'D', 's', '@', '.']
res = re.findall(r"[^ZDs@.]+", text)
print(res)
# OUTPUT: ['abc', 'lmn', 'pqr', 'ghu', 'jil', 'org']

Explanation:

  • In the first expression, i.e., re.findall(r"[\w']+", text), all occurrences of alphanumeric characters are found and stored in a list. Here, [\w]+ returns a match whenever the string contains one or more occurrences of alphanumeric characters (characters from a to Z, digits from 0-9, and the underscore _ character).
  • In the second expression, i.e., re.findall(r"[^ZDs@.]+", text), all occurrences of characters are grouped together except the delimiter characters. Let’s break down the pattern used to understand this. []+ denotes that all occurrences of one or more characters except (given by ^) ‘Z’, ‘D’, ‘s’, ‘@’ and ‘.’ will be returned. Thus, whenever the script finds and groups all characters until any of the mentioned characters within the square brackets are found. As soon as one of the mentioned characters is found it splits the string and finds the next group of characters.

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.


Method 2: Using replace() and split()

This might not be the most efficient solution, but keeping in mind that not all of us are extremely fond of regex, you can use this solution.

Approach: The idea here is to replace all the given delimiters present in the string with a normal whitespace character and then split the modified string to get the list of split substrings.

Code:

def splitter(txt, delim):
    for i in txt:
        if i in delim:
            txt = txt.replace(i, ' ')
    return txt.split()

# Example 1
text = "Welcome!Finxter-Master the art@Python"
sep = ['!', '', '-', ' ', '@']
print(splitter(text, sep))

# Example 2
text = "abcZlmnDpqrsghu@jil.org"
sep = ['Z', 'D', 's', '@', '.']
print(splitter(text, sep))

Output:

['Welcome', 'Finxter', 'Master', 'the', 'art', 'Python']
['abc', 'lmn', 'pqr', 'ghu', 'jil', 'org']

πŸ“šReaders Digest:
(1) Python String split()

(2)
Python String replace()

Conclusion

We have successfully solved the given problem using different approaches. I hope this article helped you in your Python coding journey. Please subscribe and stay tuned for more interesting articles.

Happy Pythoning! πŸ 


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Β Β 

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Β 

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Β Β Regular expressions ​rule the game ​when text processing ​meets computer science.Β 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: