Python | Split String after Character

4.8/5 - (5 votes)

Problem Formulation

📜 Problem: Given a string. How will you split the string after a specific character?

Note that there can be different scenarios that can come up when you are trying to split a string after a certain character/sub-string. So, let’s understand the given problem with the help of numerous examples.

🏷️Scenario 1: Splitting the string by a single character as a delimiter

Note that, in the following example, as soon as a certain character occurs in the given string, it gets split into two sections such that one section contains the substring until the character occurs, and the other section contains the substring after the character occurs.

# Input 
text = "Lorem ipsum amet dolor sit s amet consectetur adipisametcing. amets"
split_char = "s"
# Expected Output:
['Lorem ips', 'um amet dolor s', 'it s', ' amet cons', 'ectetur adipis', 'ametcing. amets']

🏷️Scenario 2

Suppose you are given the following string: “Welcome to the world of Python“. You have to split the string by all the ‘l‘ after the first occurrence of the letter ‘t.’

# Input:
text = "Welcome to the world of Python"
# Expected Output:
['Welcome to the worl', 'd of Python']

Note: The first ‘l‘ in occurs in the substring ‘Welcome‘, however, it is skipped as the letter ‘t‘ has not occurred before.

🏷️Scenario 3: Split String after Substring

In the following example, you are required to split a given string after the occurrence of a substring.

# Input
text = "Lorem ipsum amet dolor sit amet conseametctetur adipisametcing. amet"
split_substring = 'amet'
# Expected Output
['Lorem ipsum amet', ' dolor sit amet', ' conseamet', 'ctetur adipisamet', 'cing. amet']

Note: In every scenario, the split character/substring was kept intact, i.e., it was not eliminated from the resultant list containing the split strings. So, the bottom line is – “You not only have to split the given string after a specific character or a substring but also keep the separators.”

Related Read: How To Split A String And Keep The Separators?

Okay! So, we have three different scenarios at hand. Let’s dive into the different ways of solving each scenario.

Method 1: Using split()

You can simply use the split() function to split the given string after the occurrence of a specific character or a substring.

🏷️Solution to Scenario 1

text = "Lorem ipsum amet dolor sit s amet consectetur adipisametcing. amets"
li = [i + "s" for i in text.split('s')]
if li[-1] == "s":
    li.pop(-1)
else:
    li[-1] = li[-1].rstrip('s')
print(li)

# Output: "Lorem ipsum amet dolor sit s amet consectetur adipisametcing. amets"

Explanation:

  • In the above solution, we have used a list comprehension using which the given string has been split into a list of substrings using the split() function.
  • Though this list splits the string into numerous substrings, however, it eliminates the given character, which serves as the delimiter. Instead, we want to include this delimiter within the split substring. In order to achieve that, you can concatenate the given separator character along with the split substrings returned by the split() function.
  • This further leads to another problem –
    • You will either have a list with substrings wherein the last substring will have an extra character (i.e., the given delimiter character). To deal with this, you can simply strip the extra character from the string using the rstrip() function.
    • Or you might have an extra character as an item at the end of the resultant list. So, in order to get rid of the extra character that appears at the end of the list, you can simply remove it using the pop() function.

🏷️Solution to Scenario 2

Code:

# Given string
s = "Welcome Finxter to the world Python"
w = "l"
# Finding the occurrence of the first 't.'
first = s.index('t')
# Splitting the string using the character 'l.' 
res = s[first:].split(w)
res[0] = s[:first] + res[0] + w
res = [x.strip() for x in res]
print(res)

# OUTPUT: ['Welcome Finxter to the worl', 'd Python']

Explanation: 

  • In the above code, we have first used the index() method to find the first occurrence of the letter ‘t‘ in the string. Python String index() returns the index of the first occurrence of the specified substring.
  • Next, we used the split() function to split the string at the character ‘l‘ after the letter t and stored it in a variable ‘res‘.
  • As the first element only has the substring until the letter ‘t‘, we have merged the string upto the letter ‘t‘ with the first element. While the second element is already taken care of by the split() method.
  • The strip() function has been used to trim the whitespaces on the left and right and return a new string. Finally, we have the res variable that contains the required split substrings.

Alternate Formulation: Instead of using the index() method, you can also use the find() function that returns the index of the first occurrence of the specified substring. Follow the code given below to visualize how find() method helps you to solve scenario 2.

s = "Welcome to the world of Python"
w = "l"
# Finding the occurrence of the first 't.'
st = s[s.find("t"):]
# Splitting the string using the character 'l.'
res = st.split(w)
st2 = s[:s.find("t")]
res[0] = st2 + res[0] + w
res = [x.strip() for x in res]
print(res)

# OUTPUT: ['Welcome to the worl', 'd of Python']

🏷️Solution to Scenario 3

To split the given string based on the occurrence of a substring you can use the exact technique used in the solution to scenario 1 given above. Follow the code given below to visualize the solution.

text = "Lorem ipsum amet dolor sit amet conseametctetur adipisametcing. amet"
li = [i + "amet" for i in text.split('amet')]
if li[-1] == "amet":
    li.pop(-1)
else:
    li[-1] = li[-1].replace(' amet', '')
print(li)

Method 2: Using regex

Another efficient way to solve the given problems is to use Python’s regex module. To solve scenarios 1 and 2, you can use a list comprehension that splits the given string after the occurrence of a character or a specified substring. The problem here will be the separator that will appear as an independent item in the newly formed list. In order to handle this, you can simply merge the previous item and the item that represents the separator such that they are merged into a single item. This can be done with the help of a condition that returns True whenever the separator is found in the list and then you can concatenate the previous and the current items of the list. To take care of the current item, which happens to be the separator string, you can simply eliminate it using the pop() method.

Code:

# Solution to scenario 1
import re

text = "Lorem ipsum amet dolor sit amet conseametctetur adipisametcing. amets"
li = [x for x in re.split('(s)', text) if x != '']
for count, element in enumerate(li):
    if element == 's':
        li[count - 1] = li[count - 1] + li[count]
        li.pop(count)
print(li)

# OUTPUT: ['Lorem ips', 'um amet dolor s', 'it amet cons', 'eametctetur adipis', 'ametcing. amets']

# Solution to Scenario 3
import re

text = "Lorem ipsum amet dolor sit amet conseametctetur adipisametcing. amet"
li = [x for x in re.split('(amet)', text) if x != '']
for count, element in enumerate(li):
    if element == 'amet':
        li[count - 1] = li[count - 1] + li[count]
        li.pop(count)
print(li)

# OUTPUT: ['Lorem ipsum amet', ' dolor sit amet', ' conseamet', 'ctetur adipisamet', 'cing. amet']

Solving ‘Scenario 2‘ is a little tricky! You first have to extract the section of the string that comes after the occurrence of the letter “t“. Then split it using the given separator ‘l‘. However, this will lead to a split substring that does not have the section of the original string before the occurrence of the letter ‘t‘. So, you must concatenate the first item of the returned list containing the split strings along with the section containing the substring before the letter ‘t‘. Finally, eliminate the separator that appears as an independent item in the newly formed list using the merge and pop technique as used in the above solutions.

Code:

import re

s = "Welcome Finxter to the world Python"
li = [x for x in re.split('(l)', s[s.find('t'):])]
li[0] = s[:s.find('t')] + li[0]
for count, element in enumerate(li):
    if element == 'l':
        li[count - 1] = li[count - 1] + li[count]
        li.pop(count)
print(li)

# OUTPUT: ['Welcome Finxter to the worl', 'd Python']

Conclusion

Congratulations! You have successfully learned to split a given string after the occurrence of a certain character or a string. I highly recommend you to have a look at pretty similar scenarios in the following tutorials:

Stay tuned and subscribe for more interesting solutions and discussions.


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.