Python | Split String by Underscore

5/5 - (2 votes)

Summary: Use "given string".split('_') to split the given string by underscore and store each word as an individual item in a list.

Minimal Example

text = "Welcome_to_the_world_of_Python"
# Method 1
print(text.split("_"))

# Method 2
import re
print(re.split('_', text))

# Method 3
print(list(filter(None, text.split('_'))))

# Method 4
print([x for x in re.findall(r'[^_]*|(?!_).*$', text) if x != ''])

# OUTPUT: ['Welcome', 'to', 'the', 'world', 'of', 'Python']

Problem Formulation

📜Problem: Given a string, how will you split the string into a list of words using the underscore as a delimiter?

Example

Let’s understand the problem with the help of an example.

# Input:
text = "Python_Pycharm_Java_Eclipse_Golang_VisualStudio"
# Output:
['Python', 'Pycharm', 'Java', 'Eclipse', 'Golang', 'VisualStudio']

Now without any further ado, let’s dive into the numerous ways of solving this problem.

Method 1: Using split()

Python’s built-in split() function splits the string at a given separator and returns a split list of substrings. Here’s how the split() function works: 'finxterx42'.split('x') will split the string with the character ‘x’ as the delimiter and return the following list as an output: ['fin', 'ter', '42'].

Approach: To split a string by underscore, you need to use the underscore as the delimiter. You can simply pass the underscore as a separator to the split('_') function.

Code:

text = "Python_Pycharm_Java_Eclipse_Golang_VisualStudio"
print(text.split("_"))

# ['Python', 'Pycharm', 'Java', 'Eclipse', 'Golang', 'VisualStudio']

🌏Related Read: Python String split()

Method 2: Using re.split()

Another way of separating a string by using the underscore as a separator is to use the re.split() method from the regex library. The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

Approach: You can simply use the re.split() method as re.split('_', text) where '_' returns a match whenever the string contains an underscore. Whenever any underscore is encountered, the text gets separated at that point.

Code:

import re
text = "Python_Pycharm_Java_Eclipse_Golang_VisualStudio"
print(re.split('_', text))

# ['Python', 'Pycharm', 'Java', 'Eclipse', 'Golang', 'VisualStudio']

🌏Related Read: Python Regex Split

Method 3: Using filter()

Note: This approach is efficient when the resultant list contains empty strings along with substrings.

Python’s built-in filter() function filters out the elements that pass a filtering condition. It takes two arguments: function and iterable. The function assigns a Boolean value to each element in the iterable to check whether the element will pass the filter or not. It returns an iterator with the elements that passes the filtering condition.

Approach: You can use the filter() method to split the string by underscore. The function takes None as the first argument and the list of split strings as the second argument. The filter() function iterates through the list and removes any empty elements. As the filter() method returns an object, we need to use the list() to convert the object into a list.

Code:

text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
print(list(filter(None, text.split('-'))))

# ['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

🌏Related Read: Python filter()

Method 4: Using re.findall()

The re.findall(pattern, string) method scans the string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order- when scanning the string from left to right.

Approach: You can use the re.findall() method from the regex module to split the string by underscore. You can use ‘[^_]‘ as the pattern that can be fed into the findall function to solve the problem. It simply, means a set all characters that are joined by an underscore will be grouped together.

Code:

import re
text = "Python_Pycharm_Java_Eclipse_Golang_VisualStudio"
print([x for x in re.findall(r'[^_]*', text) if x != ''])

# ['Python', 'Pycharm', 'Java', 'Eclipse', 'Golang', 'VisualStudio']

🌏Related Read: Python re.findall() – Everything You Need to Know

Python | Split String by Asterisk

Now that we have gone through numerous ways of solving the given problem, here’s a similar programming challenge for you to solve.

Challenge: You are given a string that contains * in it. How will you split the string using a * as a delimiter? Consider the code below and try to split the string by *.

# Input:
text = "a*b*c

# Expected Output:
['a', 'b', 'c']

Try to solve the problem yourself before looking into the given solutions.

Solution: Here are the different methods to split a string by using the dot as a delimiter/separator:

text = "stars.moon.sun.sky"
# Method 1
print(text.split("."))

# Method 2
print(list(filter(None, text.split('.'))))

# Method 3
import re
print([x for x in re.findall(r'[^.]*|(?!.).*$', text) if x != ''])

# Method 4
print(re.split('\\.', text))

Conclusion

Hurrah! We have successfully solved the given problem using as many as four different ways. I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.