Python Regex Split Without Empty String

Problem Formulation

Say, you use the re.split(pattern, string) function to split a string on all occurrences of a given pattern. If the pattern appears at the beginning or the end of the string, the resulting split list will contain empty strings. How to get rid of the empty strings automatically?

Here’s an example:

import re

s = '--hello-world_how    are\tyou-----------today\t'

words = re.split('[-_\s]+', s)
print(words)
# ['', 'hello', 'world', 'how', 'are', 'you', 'today', '']

Note the empty strings in the resulting list.

Background

The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b']—and re.split('a', 'abbabbbaba') results in the list of strings ['', 'bb', 'bbb', 'b', ''] with empty strings.

Related Article: Python Regex Split

Method 1: Remove all Empty Strings From the List using List Comprehension

The trivial solution to this problem is to remove all empty strings from the resulting list using list comprehension with a condition such as [x for x in words if x!=''] to filter out the empty string.

import re

s = '--hello-world_how    are\tyou-----------today\t'

# Method 1: Remove all Empty Strings From the List
words = re.split('[-_\s]+', s)
words = [x for x in words if x!='']
print(words)
# ['hello', 'world', 'how', 'are', 'you', 'today']

Method 2: Remove all Empty Strings From the List using filter()

An alternative solution is to remove all empty strings from the resulting list using filter() such as filter(bool, words) to filter out the empty string '' and other elements that evaluate to False such as None.

import re

s = '--hello-world_how    are\tyou-----------today\t'

# Method 2: Remove Empty Strings From List using filter()
words = re.split('[-_\s]+', s)
words = list(filter(bool, words))
print(words)
# ['hello', 'world', 'how', 'are', 'you', 'today']

Method 3: Use re.findall() Instead

A simple and Pythonic solution is to use re.findall(pattern, string) with the inverse pattern used for splitting the list. If pattern A is used as a split pattern, everything that does not match pattern A can be used in the re.findall() function to essentially retrieve the split list.

Here’s the example that uses a negative character class [^-_\s]+ to find all characters that do not match the split pattern:

import re

s = '--hello-world_how    are\tyou-----------today\t'

# Method 3: Use re.findall()
words = re.findall('([^-_\s]+)', s)
print(words)

The result is the same split list:

['hello', 'world', 'how', 'are', 'you', 'today']

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!