🍎Summary: The most efficient way to split a string using multiple whitespaces is to use the split
function like so given_string.split()
. An alternate approach is to use different functions of the regex package to split the string at multiple whitespaces.
Minimal Example:
import re text = "mouse\nsnake\teagle human" # Method 1 print(text.split()) # Method 2 res = re.split("\s+", text) print(res) # Method 3 res = re.sub(r'\s+', ',', text).split(',') print(res) # Method 4 print(re.findall(r'\S+', text)) # ['mouse', 'snake', 'eagle', 'human']
Problem Formulation
📜Problem: Given a string. How will you split the string using multiple whitespaces?
Example
# Input text = "abc\nlmn\tpqr xyz\rmno" # Output ['abc', 'lmn', 'pqr', 'xyz', 'mno']
There are numerous ways of solving the given problem. So, without further ado, let us dive into the solutions.
Method 1: Using Regex
The best way to deal with multiple delimiters is to use the flexibility of the regular expressions library. There are different functions available in the regex library that you can use to split the given string. Let’s go through each one by one.
1.1 Using re.split
The re.split(pattern, string)
method matches all occurrences of the pattern
in the string
and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab')
results in the list of strings ['bb', 'bbb', 'b']
.
📚Recommended Read: Python Regex Split.
Approach: To split the string using multiple whitespace characters use re.split("\s+", text)
where \s
is the matching pattern and it represents a special sequence that returns a match whenever it finds any whitespace character and splits the string.
Code:
import re text = "abc\nlmn\tpqr xyz\rmno" res = re.split("\s+", text) print(res) # ['abc', 'lmn', 'pqr', 'xyz', 'mno']
1.2 Using re.findall
The re.findall(pattern, string)
method scans the string
from left to right, searching for all non-overlapping matches of the pattern
. It returns a list of strings in the matching order when scanning the string from left to right.
📚Recommended Read: Python re.findall() – Everything You Need to Know
Code:
import re text = "abc\nlmn\tpqr xyz\rmno" print(re.findall(r'\S+', text))
Explanation: In the expression, i.e., re.findall(r"\S'+", text)
, all occurrences of characters except whitespaces are found and stored in a list. Here, \S+
returns a match whenever the string contains one or more occurrences of normal characters (characters from a to Z, digits from 0-9, etc. However, not the whitespaces are considered).
1.3 Using re.sub
The regex function re.sub(P, R, S)
replaces all occurrences of the pattern P
with the replacement R
in string S
. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb')
, the result will be the new string 'bbbb'
with all characters 'a'
replaced by 'b'
.
Aprroach: Use the re.sub
method to replace all occurrences of whitespace characters in the given string with a comma. Thus, the string will now have commas instead of whitespace characters and you can simply split it using a normal string split method by passing comma as the delimiter.
Code:
import re text = "abc\nlmn\tpqr xyz\rmno" res = re.sub(r'\s+', ',', text).split(',') print(res) # ['abc', 'lmn', 'pqr', 'xyz', 'mno']
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Method 2: Using split()
By default the split
function splits a given string at whitespaces. Meaning, if you do not pass any delimiter to the split function then the string will be split at whitespaces. You can use this default property of the split function and successfully split the given string at multiple whitespaces just by using the split()
function.
Code:
text = "abc\nlmn\tpqr xyz\rmno" print(text.split()) # ['abc', 'lmn', 'pqr', 'xyz', 'mno']
📚Recommended Digest: Python String split()
Conclusion
We have successfully solved the given problem using different approaches. Simply using split could do the job for you. However, feel free to explore and try out the other options mentioned above. I hope this article helped you in your Python coding journey. Please subscribe and stay tuned for more interesting articles.
Happy Pythoning! 🐍
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: