πSummary: Use Python’s built-in split function to split
a given string into a list substrings. Other methods include using the regex
library and the map
function.
Minimal Example
text = "Python Java Golang" # Method 1 print(text.split()) # Method 2 import re print(re.split('\s+',text)) # Method 2.1 print(re.findall('\S+', text)) # Method 3 li = list(map(str.strip, text.split())) res = [] for i in li: for j in i.split(): res.append(j) print(res) # OUTPUTS: ['Python', 'Java', 'Golang']
Problem Formulation
πProblem: Given a string containing numerous substrings. How will you split the string into a list of substrings?
Let’s understand the problem with the help of an example.
Example
# Input text = "word1 word2 word3 word4 word5" # Output ['word1', 'word2', 'word3', 'word4', 'word5']
Method 1: Using strip
Approach: Use the split("sep")
function where sep is the specified separator. In our case the separator is a space. Hence, you do not need to pass any separator to the function as whitespaces are considered to be default separators for the split
function. Therefore, whenever a space occurs the string will be split and the substring will be stored in a list.
Code:
text = "word1 word2 word3 word4 word5" print(text.split()) # ['word1', 'word2', 'word3', 'word4', 'word5']
Method 2: Using re.split
The re.split(pattern, string)
method matches all occurrences of the pattern
in the string
and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab')
results in the list of strings ['bb', 'bbb', 'b']
.
Approach: Use thr re.split('\s+',text)
method, where text
is the given string and ‘\s+
‘ returns a match whenever it finds a space in the string.Therefore, on every occurrence of a space the string will be split.
Code:
import re text = "word1 word2 word3 word4 word5" print(re.split('\s+',text)) # ['word1', 'word2', 'word3', 'word4', 'word5']
πRelated Read: Python Regex Split
Method 3: Using re.findall
The re.findall(pattern, string)
method scans string
from left to right, searching for all non-overlapping matches of the pattern
. It returns a list of strings in the matching order when scanning the string from left to right.
πRelated Read: Python re.findall() β Everything You Need to Know
Approach: Use thr re.findall('\S+',text)
method, where text
is the given string and ‘\S+
‘ returns a match whenever it finds a normal character in the string except whitespace. Therefore, all the non-whitespace characters will be grouped together until the script encounters a space. On the occurrence of a space, the string will be split and the next group of characters that do not include a space will be searched.
Code:
import re text = "word1 word2 word3 word4 word5" print(re.findall('\S+', text)) # ['word1', 'word2', 'word3', 'word4', 'word5']
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Method 4: Using map
Prerequisite: The map()
function transforms one or more iterables into a new one by applying a βtransformator functionβ to the i-th elements of each iterable. The arguments are the transformator function object and one or more iterables. If you pass n iterables as arguments, the transformator function must be an n-ary function taking n input arguments. The return value is an iterable map object of transformed, and possibly aggregated, elements.
πRelated Read: Python map() β Finally Mastering the Python Map Function [+Video]
Approach: Use the map
function such that the iterable is the split list of substrings. This is the second argument of the map method. Now each item of this list will be passed to the strip
method which eliminates the trailing spaces if any and then returns a map object containing the split substrings. You can convert this map object to a list using the list constructor.
Code:
text = "word1 word2 word3 word4 word5" li = list(map(str.strip, text.split())) res = [] for i in li: for j in i.split(): res.append(j) print(res) # ['word1', 'word2', 'word3', 'word4', 'word5']
Exercise
Problem: Given a string containing numerous substrings separated by commas and spaces. How will you extract the substrings and store them in a list? Note that you have to eliminate the whitespaces as well as the commas.
# Input text = "One, Two, Three" # Output ['One', 'Two', 'Three']
πHint: Python | Split String by Comma and Whitespace
Solution:
text = "One, Two, Three" print([x.strip() for x in text.split(',')]) # ['One', 'Two', 'Three']
Conclusion
With that, we come to the end of this tutorial. I hope the methods discussed in this article have helped you and answered your queries. Please stay tuned and subscribe for more solutions and discussions in the future.
Happy learning!π
Check out my new Python book Python One-Liners (Amazon Link).
If you like one-liners, you’ll LOVE the book. It’ll teach you everything there is to know about a single line of Python code. But it’s also an introduction to computer science, data science, machine learning, and algorithms. The universe in a single line of Python!
The book was released in 2020 with the world-class programming book publisher NoStarch Press (San Francisco).
Publisher Link: https://nostarch.com/pythononeliners