Python | Split String by Tab

Rate this post

Summary: There are 3 ways of splitting the given string by tab:
(i) Using split()
(ii) Using re.split()
(iii) Using re.compile and re.findall

Minimal Example

import re
num = "123\t45\t\t6\t789"
# Method 1
print(num.split())
# OUTPUT: ['123', '45', '6', '789']

# Method 2
print(re.split(r'\t+', num))
# OUTPUT: ['123', '45', '6', '789']

# Method 2
print(re.compile("[^\t]+").findall(num))
# OUTPUT: ['123', '45', '6', '789']

Problem Formulation

πŸ“œProblem: Given a string. How will you split the string by tab?

Example

Let us visualize the problem with the help of an example

# input
text = "abc\t\txy\tcda\t\tmnop"
# Expected output
['abc', 'xy', 'cda', 'mnop']

Now that we have an overview of our problem let us dive into the solutions without further ado.

Method 1: Using split()

Approach: When you split a string without passing any delimiter, then by default, any whitespace is considered a delimiter. You can use this to your advantage and simply split the given string without passing any delimiter within the split() method.

Code:

text = "abc\t\txy\tcda\t\tmnop"
print(text.split())
# ['abc', 'xy', 'cda', 'mnop']

🌎Related Read: Python String split()

Method 2: Using re.split()

Prerequisite: The re.split(pattern, string, maxsplit=0, flags=0) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those.

🌎Read More: Python Regex Split

Approach: Use Python’s regex package and call the split method, which takes two arguments. The first argument should be the pattern that you want to match while splitting. In this case, it is a simple tab. So, use the expression as \t+ which searches for one or more occurrences of a tab. The second argument is the given string (sequence) itself. That’s it!

Code:

import re
text = "abc\t\txy\tcda\t\tmnop"
print(re.split(r'\t+', text))

# ['abc', 'xy', 'cda', 'mnop']

Method 3: Using re.compile()

The method re.compile(pattern) returns a regular expression object from the pattern that provides basic regex methods such as pattern.search(string)pattern.match(string), and pattern.findall(string). The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say, search(pattern, string) at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.

🌎Read More: Python Regex Compile

Code:

import re
text = "abc\t\txy\tcda\t\tmnop"
print(re.compile("[^\t]+").findall(text))
# ['abc', 'xy', 'cda', 'mnop']

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Exercise

Given a string containing tabs at the start, middle and end of the string. How will you split the string using a tab as a delimiter? Note that your resultant list must not have empty strings.

Challenge: Consider the code given below. The output contains empty strings. Can you eliminate the empty strings from the list?

# Given 
colours = '\tRed\tBlack\tYellow\tBlue\t'
print(colours.split('\t'))
# Output
['', 'Red', 'Black', 'Yellow', 'Blue', '']
# Expected Output
['Red', 'Black', 'Yellow', 'Blue']

Solution: The filter() method can be used to filter out the empty strings from the list. The function takes None as the first argument and the list of split strings as the second argument. It then iterates through the list and removes the empty elements. As the filter() method returns a filter object, we need to use the list() to convert the object into a list so that it can be viewed in a human-readable form.

colours = '\tRed\tBlack\tYellow\tBlue\t'
res = list(filter(None, colours.split('\t')))
print(res)

Note: Python’s built-in filter() function is used to filter out elements that pass a filtering condition. It takes two arguments: function and iterable. The function assigns a Boolean value to each element in the iterable to check whether the element will pass the filter or not. It returns an iterator with the elements that pass the filtering condition.

🌎Read More: Python filter()

Conclusion

Hurrah! We have successfully solved the given problem using as many as three different ways. I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.Β Β 

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.Β 

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.Β Β Regular expressions ​rule the game ​when text processing ​meets computer science.Β 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: