✨Summary: Use the re.split
function of the regex module to split the string using the given delimiter and set the regex flag re.IGNORECASE
to ensure a case-insensitive split.
Minimal Example
import re text = "PythonsepJavaSEPGolang" print(re.split("(sep)", text, flags=re.IGNORECASE)) # ['Python', 'sep', 'Java', 'SEP', 'Golang']
Problem Formulation
📜Problem: Given a string containing a mixed blend of uppercase and lowercase characters. How will you split the string using a separator that is case-insensitive?
Example
# Given text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet" # EXPECTED OUTPUT: ['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']
In the above programming problem, the separator used to split in the given string is the word “finxter
“. Note that the word finxter appears three times in the entire string and each time, the characters/letters in the word are of different cases, i.e., it appears as ‘Finxter’, ‘finxter’ and ‘fInxTeR’ in three different places in the string. However, irrespective of the case of letters in the word it is still considered as a separator. Hence, it clearly means that the separator is case insensitive.
Now that you have a clear picture of what the problem asks you to do, let us dive into the different ways of solving it.
Method 1: Using re.split()
Approach: Use the split()
function of the regex module along to split the given string using the separator ‘finxter
‘. To take care of the case insensitivity of the separator string, you can specify a special flag within the re.split()
function as flags=re.IGNORECASE
. Put the separator expression within parenthesis (i.e. "(finxter)"
) which ensures that the separator is also stored in the resultant list.
Code:
import re text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet" res = re.split("(finxter)", text, flags=re.IGNORECASE) print(res) # ['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']
Note:
- The
re.split(pattern, string)
method matches all occurrences of thepattern
in thestring
and divides the string along the matches resulting in a list of strings between the matches. For example,re.split('a', 'bbabbbab')
results in the list of strings['bb', 'bbb', 'b']
. re.IGNORECASE
is regex flag. When this flag is used, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
⚡Related Reads:
(i) Python Regex Split
(ii) Intermezzo: Python Regex Flags
Method 2: Index-Based String Split
Approach:
- Convert the entire given string to lowercase.
- Search and store each starting index of the sepearator string (‘finxter’ in this case) present within the given string in a list. This can be done using a list comprehension as shown in the solution below.
- Instantiate a flag variable with the value 0. This flag variable will further help us to strip the string.
- Use a for loop to iterate through the list containing the starting indices of the seprator strings (Read step 2).
- Strip the given string such that the start index is given by the variable flag while the end index is given by the value of the for loop counter variable “
i
” . Append this substring to the resultant list. - Since you also need to store the separator strings as items within the resultant list, you can further strip the given string in the same iteration such that the starting index is given by the counter variable “i” and the stop index is given by the “
i+length of separator string
“. Here the separator is “finxter”, so the stop index would be “i+7
“. - Once the entire string has been stripped into different substrings and stored as items within the resultant list, you are still left with the final string that appears after the occurrence of the last separator. To include this part of the string as an element in the list, simply come out of the loop and strip the entire string such that the start index is the current value stored in the flag variable, while the end index is the last index of the given string. This essentially means that you can strip the string as “
text[flag:]
“. Append this substring to the resultant list as the final item.
Code:
import re text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet" res = [] start = [m.start() for m in re.finditer('finxter', text.lower())] flag = 0 for i in start: res.append(text[flag:i]) # If you want to keep the Delimiter res.append(text[i:i+7]) flag = i+7 res.append(text[flag:]) print(res) # ['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']
Method 3
Disclaimer: The following solutions will work if you do not want to store the delimiter and maintaining the case of the strings in the original string is not a mandatory requirement.
Approach: Convert the entire string to lowercase and then split it with “finxter” as the delimiter/separator.
3.1 Using lower()
Solution
text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet" text_1 = text.lower() print(text_1.split('finxter')) # ['lorem', 'ipsum', 'dolorsit', 'amet']
3.2 Using casefold()
Solution
text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet" text_1 = text.casefold() print(text_1.split('finxter')) # ['lorem', 'ipsum', 'dolorsit', 'amet']
Conclusion
I hope the solutions in this article have helped you. Please subscribe and stay tuned for more interesting reads in the future. Happy coding!
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: