Summary: The different methods to split a string using regex are:
- re.split()
- re.sub()
- re.findall()
- re.compile()
Minimal Example
import re text = "Earth:Moon::Mars:Phobos" # Method 1 res = re.split("[:]+", text) print(res) # Method 2 res = re.sub(r':', " ", text).split() print(res) # Method 3 res = re.findall("[^:\s]+", text) print(res) # Method 4 pattern = re.compile("[^:\s]+").findall print(pattern(text)) # Output ['Earth', 'Moon', 'Mars', 'Phobos']
Problem Formulation
πProblem: Given a string and a delimiter. How will you split the string using the given delimiter using different functions from the regular expressions library?
Example: In the following example, the given string has to be split using a hyphen as the delimiter.
# Input text = "abc-lmn-xyz" # Expected Output ['abc', 'lmn', 'xyz']
Method 1: re.split
The re.split(pattern, string)
method matches all occurrences of the pattern
in the string
and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab')
results in the list of strings ['bb', 'bbb', 'b']
.

Approach: Use the re.split
function and pass [_]+
as the pattern which splits the given string on occurrence of an underscore.
Code:
import re text = "abc_lmn_xyz" res = re.split("[_]+", text) print(res) # ['abc', 'lmn', 'xyz']
πRelated Read: Python Regex Split
Method 2: re.sub
The regex function re.sub(P, R, S)
replaces all occurrences of the pattern P
with the replacement R
in string S
. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb')
, the result will be the new string 'bbbb'
with all characters 'a'
replaced by 'b'
.

Approach: The idea here is to use the re.sub
function to replace all occurrences of underscores with a space and then use the split function to split the string at spaces.
Code:
import re text = "abc_lmn_xyz" res = re.sub(r'_', " ", text).split() print(res) # ['abc', 'lmn', 'xyz']
πRelated Read: Python Regex Sub
Method 3: re.findall
The re.findall(pattern, string)
method scans string
from left to right, searching for all non-overlapping matches of the pattern
. It returns a list of strings in the matching order when scanning the string from left to right.

Approach: Find all occurrences of characters that are separated by underscores using the re.findall()
.
Code:
import re text = "abc_lmn_xyz" res = re.findall("[^_\s]+", text) print(res) # ['abc', 'lmn', 'xyz']
πRelated Read: Python re.findall()
Method 4: re.compile
The method re.compile(pattern)
returns a regular expression object from the pattern
that provides basic regex methods such as pattern.search(string)
, pattern.match(string)
, and pattern.findall(string)
. The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say, search(pattern, string)
at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.

Code:
import re text = "abc_lmn_xyz" pattern = re.compile("[^-\s]+").findall print(pattern(text)) # ['abc', 'lmn', 'xyz']
Why use re.compile?
- Efficiency: Using
re.compile()
to assemble regular expressions is effective when the expression has to be used more than once. Thus, by using the classes/objects created by compile function, we can search for instances that we need within different strings without having to rewirte the expressions again and again. This increases productivity as well as saves time. - Readability: Another advantage of using
re.compile
is the readability factor as it leverages you the power to decouple the specification of the regex.
πRead: Is It Worth Using Pythonβs re.compile()?
Exercise
Problem: Python regex split by spaces, commas, and periods, but not in cases like 1,000 or 1.50.
Given:my_string = "one two 3.4 5,6 seven.eight nine,ten"
Expected Output:["one", "two", "3.4", "25.6" , "seven", "eight", "nine", "ten"]
Solution
my_string = "one two 3.4 25.6 seven.eight nine,ten" res = re.split('\s|(?<!\d)[,.](?!\d)', my_string) print(res) # ['one', 'two', '3.4', '25.6', 'seven', 'eight', 'nine', 'ten']
Conclusion
Therefore, we have learned four different ways of splitting a string using the regular expressions package in Python. Feel free to use the suitable technique that fits your needs. The idea of this tutorial was to get you acquainted with the numerous ways of using regex to split a string and I hope it helped you.
Please stay tuned and subscribe for more interesting discussions and tutorials in the future. Happy coding! π
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.