Summary: The different methods to split a string using regex are:
import re text = "Earth:Moon::Mars:Phobos" # Method 1 res = re.split("[:]+", text) print(res) # Method 2 res = re.sub(r':', " ", text).split() print(res) # Method 3 res = re.findall("[^:\s]+", text) print(res) # Method 4 pattern = re.compile("[^:\s]+").findall print(pattern(text)) # Output ['Earth', 'Moon', 'Mars', 'Phobos']
📜Problem: Given a string and a delimiter. How will you split the string using the given delimiter using different functions from the regular expressions library?
Example: In the following example, the given string has to be split using a hyphen as the delimiter.
# Input text = "abc-lmn-xyz" # Expected Output ['abc', 'lmn', 'xyz']
Method 1: re.split
re.split(pattern, string) method matches all occurrences of the
pattern in the
string and divides the string along the matches resulting in a list of strings between the matches. For example,
re.split('a', 'bbabbbab') results in the list of strings
['bb', 'bbb', 'b'].
Approach: Use the
re.split function and pass
[_]+ as the pattern which splits the given string on occurrence of an underscore.
import re text = "abc_lmn_xyz" res = re.split("[_]+", text) print(res) # ['abc', 'lmn', 'xyz']
🚀Related Read: Python Regex Split
Method 2: re.sub
The regex function
re.sub(P, R, S) replaces all occurrences of the pattern
P with the replacement
R in string
S. It returns a new string. For example, if you call
re.sub('a', 'b', 'aabb'), the result will be the new string
'bbbb' with all characters
'a' replaced by
Approach: The idea here is to use the
re.sub function to replace all occurrences of underscores with a space and then use the split function to split the string at spaces.
import re text = "abc_lmn_xyz" res = re.sub(r'_', " ", text).split() print(res) # ['abc', 'lmn', 'xyz']
🚀Related Read: Python Regex Sub
Method 3: re.findall
re.findall(pattern, string) method scans
string from left to right, searching for all non-overlapping matches of the
pattern. It returns a list of strings in the matching order when scanning the string from left to right.
Approach: Find all occurrences of characters that are separated by underscores using the
import re text = "abc_lmn_xyz" res = re.findall("[^_\s]+", text) print(res) # ['abc', 'lmn', 'xyz']
🚀Related Read: Python re.findall()
Method 4: re.compile
re.compile(pattern) returns a regular expression object from the
pattern that provides basic regex methods such as
pattern.findall(string). The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say,
search(pattern, string) at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.
import re text = "abc_lmn_xyz" pattern = re.compile("[^-\s]+").findall print(pattern(text)) # ['abc', 'lmn', 'xyz']
Why use re.compile?
- Efficiency: Using
re.compile()to assemble regular expressions is effective when the expression has to be used more than once. Thus, by using the classes/objects created by compile function, we can search for instances that we need within different strings without having to rewirte the expressions again and again. This increases productivity as well as saves time.
- Readability: Another advantage of using
re.compileis the readability factor as it leverages you the power to decouple the specification of the regex.
🚀Read: Is It Worth Using Python’s re.compile()?
Problem: Python regex split by spaces, commas, and periods, but not in cases like 1,000 or 1.50.
my_string = "one two 3.4 5,6 seven.eight nine,ten"
["one", "two", "3.4", "25.6" , "seven", "eight", "nine", "ten"]
my_string = "one two 3.4 25.6 seven.eight nine,ten" res = re.split('\s|(?<!\d)[,.](?!\d)', my_string) print(res) # ['one', 'two', '3.4', '25.6', 'seven', 'eight', 'nine', 'ten']
Therefore, we have learned four different ways of splitting a string using the regular expressions package in Python. Feel free to use the suitable technique that fits your needs. The idea of this tutorial was to get you acquainted with the numerous ways of using regex to split a string and I hope it helped you.
Please stay tuned and subscribe for more interesting discussions and tutorials in the future. Happy coding! 🙂
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.