Python | Split String until Character/Substring

Summary: You can use one of the following methods to split a string until a given character –

  • Using split / rsplit
  • Using string slicing
  • Using regex
  • Using partition / rpartition

Minimal Example

# Given String
text = "Learn regex and become a pro coder!"

# Method 1
res = text.split('x')[0] + "x"
print(res)

# Method 2
print(text[:text.index("x")+len('x')])

# Method 3
import re
print(re.findall('(.*?x)', text)[0])

# Method 4
print(text.partition('x')[0]+text.partition('x')[1])

# Output: Learn regex

Problem Formulation

 πŸ“œProblem: Given a string. How will you split the string until a given character or substring? The output must only contain the substring until the given delimiter.

Let’s visualize the problem with the help of an example:

Example 1

# Given String
text = "Small Town Boy in a Big Arcade"
sub_string = "Boy"

# Expected Output
Small Town Boy

Example 2

# Given String
text_ = "https://blog.finxter.com/subscribe"
character = "/"

# Expected Output
https://blog.finxter.com/

Method 1: Using split()

Solution to Example 1

Approach: First, we will simply split the string using β€œBoy” as the delimiter. Next, to extract the string till the delimiter, we will use the index of the required substring. As the split() function returns a list of substrings, we can extract the first part using list indexing [0] (Indexing starts at 0). However, we now have the substring before the given separator(i.e. boy in this case). But, our purpose is to include the substring as well. You can go ahead and simply concatenate this substring with the extracted split string.

Code:

# Given String
text = "Small Town Boy in a Big Arcade"
res = text.split('Boy')[0] + "Boy"
print(res)

# Small Town Boy

Note: The split() function splits the string at a given separator and returns a split list of substrings. It returns a list of the words in the string, using sep as the delimiter string.

🌎Related Read: Python String split()

Solution to Example 2

In the second example, you have to split using a given character and also ensure that the character is included in the final split string. Here, the given character “/” appears more than once in the string. But you must only consider the last occurrence of “/”. This can be taken care of by the rsplit method which returns a list of strings after breaking the given string from the right side by the specified separator.

Code:

# Given String
text_ = "https://blog.finxter.com/subscribe"
print(text_.rsplit('/', 1)[0]+"/")

# https://blog.finxter.com/

Method 2: Using String Slicing

Prerequisite: String slicing is a concept of carving out a substring from a given string. Use slicing notation s[start:stop:step] to access every step-th element starting from index start (included) and ending in index stop (excluded). All three arguments are optional, so you can skip them to use the default values.

Solution to Example 1

Approach: First, we will use the index() method to find the occurrence of the delimiter in the text. Next, we will slice the string from the starting index of the text until the index of the last character of delimiter substring. Therefore, to find the index of the last character of the given substring you can simply add its length to it’s starting index.

Code:

text = "Small Town Boy in a Big Arcade"
print(text[:text.index("Boy")+len('Boy')])

# Small Town Boy

Note:

The index()  method is used to return the index of the first occurrence of the specified substring, like find() but it raises a ValueError if the substring is not found.

🌎Related Reads:
String Slicing in Python

Python String index()

Solution to Example 2

Approach: Once again in the second example you have to split the string at the last occurrence of the character “/”. This can be taken care of by the rindex method which returns the highest index in the string where a substring is found. Since this approach will eliminate the “/” character from the output, you can simply concatenate it with the final output.

Code:

text_ = "https://blog.finxter.com/subscribe"
print(text_[:text_.rindex("/")]+"/")

Method 3: Using regex

The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

Approach: Use the re.findall method to find all the characters that appear until the last occurrence of the character “/”. To also include the “/” character in the final string you can specify the pattern within parenthesis which will also include the “/”. In case of the first example where the split substring is ‘Boy’, you can follow a similar approach. Follow the solutions given below to understand how this works.

Code:

import re
print(re.findall('(.*?Boy)', text)[0])
# Output: Small Town Boy

print(''.join(re.findall('(.*?/)', text_)))
# Output: https://blog.finxter.com/

Understanding the pattern (.*?)-
It captures and groups any character (except newline character) with zero or more occurrences. In the above solutions, it finds and groups all the characters until β€œBoy” in the first case and “/” in the second case.

🌎Related Read: Python Regex Match

Method 4: Using partition

The partition() method searches for a separator substring and returns a tuple with three strings: (1) everything before the separator, (2) the separator itself, and (3) everything after it. It then returns a tuple with the same three strings. 

Solution to Example 1

Approach: We have used the partition method and used β€œBoy” as the separator. As we only need the substring till the delimiter, we have used the index of the required substring on the returned tuple and just concatenated and printed the first and second elements of the tuple (everything until the separator).

Code:

text = "Small Town Boy in a Big Arcade"
print(text.partition('Boy')[0]+text.partition('Boy')[1])

# Small Town Boy

🌎Related Read: Python String partition()

Solution to Example 2

In the second case, the string has to be split until the last occurrence of the “/” character. However, this character occurs more than once in the string. So, if you directly use the partition function, it will lead to an erroneous output as the string will be split at the first occurrence of “/”. So, to take care of this situation you can use the rpartition method that searches for the last occurrence of the separator substring and returns a tuple with three strings: (1) everything before the separator, (2) the separator itself, and (3) everything after it.

# Given String
text_ = "https://blog.finxter.com/subscribe"
print(text_.rpartition('/')[0]+text_.rpartition('/')[1])

# https://blog.finxter.com/

🌎Related Read: Python String rpartition()

Conclusion

Hurrah! We have successfully solved the given problem using as many as four different ways. I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.