Python | Split String Multiple Whitespaces

🍎Summary: The most efficient way to split a string using multiple whitespaces is to use the split function like so given_string.split(). An alternate approach is to use different functions of the regex package to split the string at multiple whitespaces.

Minimal Example:

import re

text = "mouse\nsnake\teagle human"
# Method 1
print(text.split())

# Method 2
res = re.split("\s+", text)
print(res)

# Method 3
res = re.sub(r'\s+', ',', text).split(',')
print(res)

# Method 4
print(re.findall(r'\S+', text))

# ['mouse', 'snake', 'eagle', 'human']

Problem Formulation

📜Problem: Given a string. How will you split the string using multiple whitespaces?

Example

# Input
text = "abc\nlmn\tpqr   xyz\rmno"
# Output
['abc', 'lmn', 'pqr', 'xyz', 'mno']

There are numerous ways of solving the given problem. So, without further ado, let us dive into the solutions.

Method 1: Using Regex

The best way to deal with multiple delimiters is to use the flexibility of the regular expressions library. There are different functions available in the regex library that you can use to split the given string. Let’s go through each one by one.

1.1 Using re.split

The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

📚Recommended Read:  Python Regex Split.

Approach: To split the string using multiple whitespace characters use re.split("\s+", text) where \s is the matching pattern and it represents a special sequence that returns a match whenever it finds any whitespace character and splits the string.

Code:

import re
text = "abc\nlmn\tpqr   xyz\rmno"
res = re.split("\s+", text)
print(res)

# ['abc', 'lmn', 'pqr', 'xyz', 'mno']

1.2 Using re.findall

The re.findall(pattern, string) method scans the string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

📚Recommended Read: Python re.findall() – Everything You Need to Know

Code:

import re

text = "abc\nlmn\tpqr   xyz\rmno"
print(re.findall(r'\S+', text))

Explanation: In the expression, i.e., re.findall(r"\S'+", text), all occurrences of characters except whitespaces are found and stored in a list. Here, \S+ returns a match whenever the string contains one or more occurrences of normal characters (characters from a to Z, digits from 0-9, etc. However, not the whitespaces are considered).

1.3 Using re.sub

The regex function re.sub(P, R, S) replaces all occurrences of the pattern P with the replacement R in string S. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb'), the result will be the new string 'bbbb' with all characters 'a' replaced by 'b'.

Aprroach: Use the re.sub method to replace all occurrences of whitespace characters in the given string with a comma. Thus, the string will now have commas instead of whitespace characters and you can simply split it using a normal string split method by passing comma as the delimiter.

Code:

import re
text = "abc\nlmn\tpqr   xyz\rmno"
res = re.sub(r'\s+', ',', text).split(',')
print(res)

# ['abc', 'lmn', 'pqr', 'xyz', 'mno']

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.


Method 2: Using split()

By default the split function splits a given string at whitespaces. Meaning, if you do not pass any delimiter to the split function then the string will be split at whitespaces. You can use this default property of the split function and successfully split the given string at multiple whitespaces just by using the split() function.

Code:

text = "abc\nlmn\tpqr   xyz\rmno"
print(text.split())
# ['abc', 'lmn', 'pqr', 'xyz', 'mno']

📚Recommended Digest: Python String split()

Conclusion

We have successfully solved the given problem using different approaches. Simply using split could do the job for you. However, feel free to explore and try out the other options mentioned above. I hope this article helped you in your Python coding journey. Please subscribe and stay tuned for more interesting articles.

Happy Pythoning! 🐍 


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.  

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.  Regular expressions ​rule the game ​when text processing ​meets computer science. 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: