Python | Split String Hyphen

5/5 - (2 votes)

โญSummary: Use "given string".split('-') to split the given string by hyphen and store each word as an individual item in a list. Some other ways to split using hyphen include using a list comprehension and the regex library.

Minimal Example

text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
# Method 1
print(text.split("-"))
# Method 2
import re
print(re.split('-', text))
# Method 3
print(list(filter(None, text.split('-'))))
# Method 4
print([x for x in re.findall(r'[^-]*|(?!-).*$', text) if x != ''])

# OUTPUT: ['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

Problem Formulation

๐Ÿ“œProblem: Given a string, how will you split the string into a list of words using the hyphen as a delimiter?

Example

Letโ€™s understand the problem with the help of an example.

# Input:
text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
# Output:
['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

Now without any further ado, letโ€™s dive into the numerous ways of solving this problem.

Method 1: Using split()

Pythonโ€™s built-in split() function splits the string at a given separator and returns a split list of substrings. Hereโ€™s how the split() function works: 'finxterx42'.split('x') will split the string with the character โ€˜xโ€™ as the delimiter and return the following list as an output: ['fin', 'ter', '42'].

Approach: To split a string by hyphen, you can simply pass the underscore as a separator to the split('-') function.

Code:

text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
print(text.split("-"))

# ['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

๐ŸŒRelated Read: Python String split()

Method 2: Using re.split()

Another way of separating a string by using the underscore as a separator is to use the re.split() method from the regex library. The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

Approach: You can use the re.split() method as re.split('-', text) where '-' returns a match whenever the string contains a hyphen. Whenever any hyphen is encountered, the text gets separated and the split substring gets stored as an element within the resultant list.

Code:

import re
text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
print(re.split('-', text))

# ['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

๐ŸŒRelated Read: Python Regex Split

Method 3: Using filter()

Note: This approach is efficient when the resultant list contains empty strings along with substrings.

Pythonโ€™s built-in filter() function filters out the elements that pass a filtering condition. It takes two arguments: function and iterable. The function assigns a Boolean value to each element in the iterable to check whether the element will pass the filter or not. It returns an iterator with the elements that passes the filtering condition.

Approach: Use the filter() method to split the string by hyphen. The function takes None as the first argument and the list of split strings as the second argument. The filter() function then iterates through the list and removes any empty elements. As the filter() method returns an object, we need to use the list() to convert the object into a list.

Code:

text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"

print(list(filter(None, text.split('-'))))

# ['Violet', 'Indigo', 'Blue', 'Green', 'Yellow', 'Orange', 'Red']

๐ŸŒRelated Read: Python filter()

Method 4: Using re.findall()

The re.findall(pattern, string) method scans the string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order- when scanning the string from left to right.

Approach: You can use the re.findall() method from the regex module to split the string by hyphen. Use ‘[^-]|(?!-).$‘ as the pattern that can be fed into the findall function to solve the problem. It simply, means a set all characters that are joined by a hyphen will be grouped together.

Code:

import re
text = "Violet-Indigo-Blue-Green-Yellow-Orange-Red"
print(re.findall(r'[^-]+', text))

# ['Python', 'Pycharm', 'Java', 'Eclipse', 'Golang', 'VisualStudio']

๐ŸŒRelated Read: Python re.findall() โ€“ Everything You Need to Know

Python | Split String by Dot

Now that we have gone through numerous ways of solving the given problem, here’s a similar programming challenge for you to solve.

Challenge: You are given a string that contains dots in it. How will you split the string using a dot as a delimiter? Consider the code below and try to split the string by dot.

# Input:
text = "stars.moon.sun.sky"

# Expected Output:
['stars', 'moon', 'sun', 'sky']

Try to solve the problem yourself before looking into the given solutions.

Solution: Here are the different methods to split a string by using the dot as a delimiter/separator:

text = "a*b*c"
# Method 1
print(text.split("*"))

# Method 2
print(list(filter(None, text.split('*'))))

# Method 3
import re
print([x for x in re.findall(r'[^/*]*|(?!/*).*$', text) if x != ''])

# Method 4
print(re.split('[/*]', text))

Conclusion

Hurrah! We have successfully solved the given problem using as many as four different ways. We then went on to solve a similar coding challenge. I hope this article helped you. Please subscribe and stay tuned for more interesting articles!

Happy coding! ๐Ÿ™‚


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.