Summary: There are 3 ways of splitting the given string by tab:
(i) Using split()
(ii) Using re.split()
(iii) Using re.compile
and re.findall
Minimal Example
import re num = "123\t45\t\t6\t789" # Method 1 print(num.split()) # OUTPUT: ['123', '45', '6', '789'] # Method 2 print(re.split(r'\t+', num)) # OUTPUT: ['123', '45', '6', '789'] # Method 2 print(re.compile("[^\t]+").findall(num)) # OUTPUT: ['123', '45', '6', '789']
Problem Formulation
๐Problem: Given a string. How will you split the string by tab?
Example
Let us visualize the problem with the help of an example
# input text = "abc\t\txy\tcda\t\tmnop" # Expected output ['abc', 'xy', 'cda', 'mnop']
Now that we have an overview of our problem let us dive into the solutions without further ado.
Method 1: Using split()
Approach: When you split a string without passing any delimiter, then by default, any whitespace is considered a delimiter. You can use this to your advantage and simply split the given string without passing any delimiter within the split()
method.
Code:
text = "abc\t\txy\tcda\t\tmnop" print(text.split()) # ['abc', 'xy', 'cda', 'mnop']
๐Related Read: Python String split()
Method 2: Using re.split()
Prerequisite: The re.split(pattern, string, maxsplit=0, flags=0)
method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those.
๐Read More: Python Regex Split
Approach: Use Python’s regex package and call the split method, which takes two arguments. The first argument should be the pattern that you want to match while splitting. In this case, it is a simple tab. So, use the expression as \t+
which searches for one or more occurrences of a tab. The second argument is the given string (sequence) itself. That’s it!
Code:
import re text = "abc\t\txy\tcda\t\tmnop" print(re.split(r'\t+', text)) # ['abc', 'xy', 'cda', 'mnop']
Method 3: Using re.compile()
The method re.compile(pattern)
returns a regular expression object from the pattern
that provides basic regex methods such as pattern.search(string)
, pattern.match(string)
, and pattern.findall(string)
. The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say, search(pattern, string)
at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.
๐Read More: Python Regex Compile
Code:
import re text = "abc\t\txy\tcda\t\tmnop" print(re.compile("[^\t]+").findall(text)) # ['abc', 'xy', 'cda', 'mnop']
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Exercise
Given a string containing tabs at the start, middle and end of the string. How will you split the string using a tab as a delimiter? Note that your resultant list must not have empty strings.
Challenge: Consider the code given below. The output contains empty strings. Can you eliminate the empty strings from the list?
# Given colours = '\tRed\tBlack\tYellow\tBlue\t' print(colours.split('\t')) # Output ['', 'Red', 'Black', 'Yellow', 'Blue', ''] # Expected Output ['Red', 'Black', 'Yellow', 'Blue']
Solution: The filter()
method can be used to filter out the empty strings from the list. The function takes None
as the first argument and the list of split strings as the second argument. It then iterates through the list and removes the empty elements. As the filter()
method returns a filter object, we need to use the list()
to convert the object into a list so that it can be viewed in a human-readable form.
colours = '\tRed\tBlack\tYellow\tBlue\t' res = list(filter(None, colours.split('\t'))) print(res)
Note: Pythonโs built-in filter()
function is used to filter out elements that pass a filtering condition. It takes two arguments: function
and iterable
. The function
assigns a Boolean value to each element in the iterable
to check whether the element will pass the filter or not. It returns an iterator with the elements that pass the filtering condition.
๐Read More: Python filter()
Conclusion
Hurrah! We have successfully solved the given problem using as many as three different ways. I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.ย ย
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.ย
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.ย ย Regular expressions โrule the game โwhen text processing โmeets computer science.ย
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: