β¨Summary: You can use different functions of the regular expressions library to split a given string by uppercase letters. Another approach is to use a list comprehension.
Minimal Example
# Given String text = "AbcLmnZxy" import re # Method 1 print(re.findall('[A-Z][^A-Z]*', text)) # OUTPUT: ['Abc', 'Lmn', 'Zxy'] # Method 2 print(re.split('(?<=.)(?=[A-Z])', text)) # OUTPUT: ['Abc', 'Lmn', 'Zxy'] # Method 3 print(re.sub( r"([A-Z])", r" \1", text).split()) # OUTPUT: ['Abc', 'Lmn', 'Zxy'] # Method 4 pos = [i for i, e in enumerate(text+'A') if e.isupper()] print([text[pos[j]:pos[j+1]] for j in range(len(pos)-1)]) # OUTPUT: ['Abc', 'Lmn', 'Zxy'] # Method 5 print("".join([(" "+i if i.isupper() else i) for i in text]).strip().split()) # OUTPUT: ['Abc', 'Lmn', 'Zxy']
Problem Formulation
πProblem: Given a string containing uppercase and lowercase letters. How will you split the string on every occurrence of an uppercase letter?
Example
# Input text = "UpperCaseSplitString" # Output ['Upper', 'Case', 'Split', 'String']
In the above example, every time an Uppercase character occurs in the given string, the string gets split, and the split substring gets stored in a list.
Let’s dive into the solutions to the given problem.
Method 1: Using re.findall
Approach: Use re.findall('[A-Z][^A-Z]*', text)
to split the string whenever an uppercase letter appears. The expression [A-Z][^A-Z]*
finds all the set of characters that start with an uppercase letter followed by any set of characters. So, every time a match is found, it gets returned to a list.
Code:
import re text = "UpperCaseSplitString" res = re.findall('[A-Z][^A-Z]*', text) print(res) #OUTPUT: ['Upper', 'Case', 'Split', 'String']
Note: The re.findall(pattern, string)
method scans string
from left to right, searching for all non-overlapping matches of the pattern
. It returns a list of strings in the matching order when scanning the string from left to right.
πRecommended Read: Python re.findall()
Method 2: Using re.split
Approach: Once again you can use the regex package and call its split method to split the string on every occurrence of an uppercase letter using the expression '(?<=.)(?=[A-Z])'
.
Note: The re.split(pattern, string)
method matches all occurrences of the pattern
in the string
and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab')
results in the list of strings ['bb', 'bbb', 'b']
.
Code:
import re text = "UpperCaseSplitString" res = re.split('(?<=.)(?=[A-Z])', text) print(res) # OUTPUT: ['Upper', 'Case', 'Split', 'String']
πRecommended Read: Python Regex Split
Method 3: Using re.sub
Approach: Yet another method of the regex package that allows you to split the string based on the occurrence of an uppercase letter is re.sub
. The idea here is to insert a space after every occurrence of an uppercase letter and then do a normal split on the string using the split()
method.
Note: The regex function re.sub(P, R, S)
replaces all occurrences of the pattern P
with the replacement R
in string S
. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb')
, the result will be the new string 'bbbb'
with all characters 'a'
replaced by 'b'
.
Code:
import re text = "UpperCaseSplitString" res = re.sub( r"([A-Z])", r" \1", text).split() print(res) # OUTPUT: ['Upper', 'Case', 'Split', 'String']
πRecommended Read: Python Regex Sub
Method 4: Using List Comprehension
Prerequisite:
List comprehension is a compact way of creating lists. The simple formula is [expression + context]
.
- Expression: What to do with each list element?
- Context: What elements to select? The context consists of an arbitrary number of for and if statements.
The example [x for x in range(3)]
creates the list [0, 1, 2]
.
πRelated Read: List Comprehension in Python β A Helpful Illustrated Guide
Approach: The idea here is to use a couple of list comprehensions. The first list comprehension is used to find and store all the positions of each capital letter in the given string. These positions can then be used in another list comprehension to strip out the split strings accordingly.
Code:
text = "UpperCaseSplitString" pos = [i for i, e in enumerate(text+'A') if e.isupper()] parts = [text[pos[j]:pos[j+1]] for j in range(len(pos)-1)] print(parts) # OUTPUT: ['Upper', 'Case', 'Split', 'String']
Method 5: Using join+strip+split
Here’s another way of using a list comprehension to split the string on every occurrence of an uppercase character.
Code:
text = "UpperCaseSplitString" res = "".join([(" "+i if i.isupper() else i) for i in text]).strip().split() print(res) # OUTPUT: ['Upper', 'Case', 'Split', 'String']
The above code can be better understood with the help of a multiline solution shown below:
# Given String text = "UpperCaseSplitString" # resultant list res = [] # Iterate through the text for i in text: # Add a space before the letter if it is in uppercase if i.isupper(): res.append(" " + i) else: res.append(i) # Convert the resultant list to a string res = ''.join(res) # Upper Case Split String print(res.strip().split()) # OUTPUT: ['Upper', 'Case', 'Split', 'String']
Reader’s Digest
- The
string.join(iterable)
method concatenates all the string elements in the iterable (such as a list, string, or tuple) and returns the result as a new string. The string on which you call it is the delimiter stringβand it separates the individual elements. For example,'-'.join(['hello', 'world'])
returns the joined string ‘hello-world
‘.
strip
is a built-in function in Python that trims whitespaces on the left and right and returns a new string.
- Recommended reads:
Conclusion
Phew! We have successfully solved the mission-critical question in as many as five different ways. I hope the solutions have helped you. Please subscribe and stay tuned for more interesting discussions and solutions in the future. Happy coding!
πRelated Read: Python Split String Case Insensitive
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.