Python Split String by Uppercase (Capital) Letters

✨Summary: You can use different functions of the regular expressions library to split a given string by uppercase letters. Another approach is to use a list comprehension. 

Minimal Example

# Given String
text = "AbcLmnZxy"
import re
# Method 1
print(re.findall('[A-Z][^A-Z]*', text))
# OUTPUT: ['Abc', 'Lmn', 'Zxy']

# Method 2
print(re.split('(?<=.)(?=[A-Z])', text))
# OUTPUT: ['Abc', 'Lmn', 'Zxy']

# Method 3
print(re.sub( r"([A-Z])", r" \1", text).split())
# OUTPUT: ['Abc', 'Lmn', 'Zxy']

# Method 4
pos = [i for i, e in enumerate(text+'A') if e.isupper()]
print([text[pos[j]:pos[j+1]] for j in range(len(pos)-1)])
# OUTPUT: ['Abc', 'Lmn', 'Zxy']

# Method 5
print("".join([(" "+i if i.isupper() else i) for i in text]).strip().split())
# OUTPUT: ['Abc', 'Lmn', 'Zxy']

Problem Formulation

πŸ“œProblem: Given a string containing uppercase and lowercase letters. How will you split the string on every occurrence of an uppercase letter?

Example

# Input
text = "UpperCaseSplitString"
# Output
['Upper', 'Case', 'Split', 'String']

In the above example, every time an Uppercase character occurs in the given string, the string gets split, and the split substring gets stored in a list.


Let’s dive into the solutions to the given problem.

Method 1: Using re.findall

Approach: Use re.findall('[A-Z][^A-Z]*', text) to split the string whenever an uppercase letter appears. The expression [A-Z][^A-Z]* finds all the set of characters that start with an uppercase letter followed by any set of characters. So, every time a match is found, it gets returned to a list.

Code:

import re
text = "UpperCaseSplitString"
res = re.findall('[A-Z][^A-Z]*', text)
print(res)

#OUTPUT: ['Upper', 'Case', 'Split', 'String']

Note: The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

🌎Recommended Read: Python re.findall()

Method 2: Using re.split

Approach: Once again you can use the regex package and call its split method to split the string on every occurrence of an uppercase letter using the expression '(?<=.)(?=[A-Z])'.

Note: The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].

Code:

import re
text = "UpperCaseSplitString"
res = re.split('(?<=.)(?=[A-Z])', text)
print(res)

# OUTPUT: ['Upper', 'Case', 'Split', 'String']

🌎Recommended Read: Python Regex Split

Method 3: Using re.sub

Approach: Yet another method of the regex package that allows you to split the string based on the occurrence of an uppercase letter is re.sub. The idea here is to insert a space after every occurrence of an uppercase letter and then do a normal split on the string using the split() method.

Note: The regex function re.sub(P, R, S) replaces all occurrences of the pattern P with the replacement R in string S. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb'), the result will be the new string 'bbbb' with all characters 'a' replaced by 'b'.

Code: 

import re
text = "UpperCaseSplitString"
res = re.sub( r"([A-Z])", r" \1", text).split()
print(res)

# OUTPUT: ['Upper', 'Case', 'Split', 'String']

🌎Recommended Read: Python Regex Sub

Method 4: Using List Comprehension

Prerequisite: 

List comprehension is a compact way of creating lists. The simple formula is [expression + context].

  • Expression: What to do with each list element?
  • Context: What elements to select? The context consists of an arbitrary number of for and if statements.

The example [x for x in range(3)] creates the list [0, 1, 2].

🌎Related Read: List Comprehension in Python β€” A Helpful Illustrated Guide

Approach: The idea here is to use a couple of list comprehensions. The first list comprehension is used to find and store all the positions of each capital letter in the given string. These positions can then be used in another list comprehension to strip out the split strings accordingly.

Code:

text = "UpperCaseSplitString"
pos = [i for i, e in enumerate(text+'A') if e.isupper()]
parts = [text[pos[j]:pos[j+1]] for j in range(len(pos)-1)]
print(parts)

# OUTPUT: ['Upper', 'Case', 'Split', 'String']

Method 5: Using join+strip+split

Here’s another way of using a list comprehension to split the string on every occurrence of an uppercase character.

Code:

text = "UpperCaseSplitString"
res = "".join([(" "+i if i.isupper() else i) for i in text]).strip().split()
print(res)
# OUTPUT: ['Upper', 'Case', 'Split', 'String']

The above code can be better understood with the help of a multiline solution shown below:

# Given String
text = "UpperCaseSplitString"
# resultant list
res = []
# Iterate through the text
for i in text:
    # Add a space before the letter if it is in uppercase
    if i.isupper():
        res.append(" " + i)
    else:
        res.append(i)
# Convert the resultant list to a string
res = ''.join(res)  # Upper Case Split String
print(res.strip().split()) 

# OUTPUT: ['Upper', 'Case', 'Split', 'String']

Reader’s Digest

  • The string.join(iterable) method concatenates all the string elements in the iterable (such as a list, string, or tuple) and returns the result as a new string. The string on which you call it is the delimiter stringβ€”and it separates the individual elements. For example, '-'.join(['hello', 'world']) returns the joined string ‘hello-world‘.
  • strip is a built-in function in Python that trims whitespaces on the left and right and returns a new string.

Conclusion

Phew! We have successfully solved the mission-critical question in as many as five different ways. I hope the solutions have helped you. Please subscribe and stay tuned for more interesting discussions and solutions in the future. Happy coding!

🌎Related Read: Python Split String Case Insensitive


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.