Python Split String Case Insensitive

Summary: Use the re.split function of the regex module to split the string using the given delimiter and set the regex flag re.IGNORECASE to ensure a case-insensitive split.

Minimal Example

import re
text = "PythonsepJavaSEPGolang"
print(re.split("(sep)", text, flags=re.IGNORECASE))

# ['Python', 'sep', 'Java', 'SEP', 'Golang']

Problem Formulation

📜Problem: Given a string containing a mixed blend of uppercase and lowercase characters. How will you split the string using a separator that is case-insensitive?

Example

# Given
text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet"
# EXPECTED OUTPUT:
['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']

In the above programming problem, the separator used to split in the given string is the word “finxter“. Note that the word finxter appears three times in the entire string and each time, the characters/letters in the word are of different cases, i.e., it appears as ‘Finxter’, ‘finxter’ and ‘fInxTeR’ in three different places in the string. However, irrespective of the case of letters in the word it is still considered as a separator. Hence, it clearly means that the separator is case insensitive.

Now that you have a clear picture of what the problem asks you to do, let us dive into the different ways of solving it.

Method 1: Using re.split()

Approach: Use the split() function of the regex module along to split the given string using the separator ‘finxter‘. To take care of the case insensitivity of the separator string, you can specify a special flag within the re.split() function as flags=re.IGNORECASE. Put the separator expression within parenthesis (i.e. "(finxter)") which ensures that the separator is also stored in the resultant list.

Code:

import re
text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet"
res = re.split("(finxter)", text, flags=re.IGNORECASE)
print(res)

# ['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']

Note:

  • The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].
  • re.IGNORECASE is regex flag. When this flag is used, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].

Related Reads:
(i) Python Regex Split
(ii) Intermezzo: Python Regex Flags

Method 2: Index-Based String Split

Approach:

  • Convert the entire given string to lowercase.
  • Search and store each starting index of the sepearator string (‘finxter’ in this case) present within the given string in a list. This can be done using a list comprehension as shown in the solution below.
  • Instantiate a flag variable with the value 0. This flag variable will further help us to strip the string.
  • Use a for loop to iterate through the list containing the starting indices of the seprator strings (Read step 2).
  • Strip the given string such that the start index is given by the variable flag while the end index is given by the value of the for loop counter variable “i” . Append this substring to the resultant list.
  • Since you also need to store the separator strings as items within the resultant list, you can further strip the given string in the same iteration such that the starting index is given by the counter variable “i” and the stop index is given by the “i+length of separator string“. Here the separator is “finxter”, so the stop index would be “i+7“.
  • Once the entire string has been stripped into different substrings and stored as items within the resultant list, you are still left with the final string that appears after the occurrence of the last separator. To include this part of the string as an element in the list, simply come out of the loop and strip the entire string such that the start index is the current value stored in the flag variable, while the end index is the last index of the given string. This essentially means that you can strip the string as “text[flag:]“. Append this substring to the resultant list as the final item.

Code:

import re
text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet"
res = []
start = [m.start() for m in re.finditer('finxter', text.lower())]
flag = 0
for i in start:
    res.append(text[flag:i])
    # If you want to keep the Delimiter
    res.append(text[i:i+7])
    flag = i+7
res.append(text[flag:])
print(res)

# ['Lorem', 'Finxter', 'ipsum', 'finxter', 'dolorsit', 'fInxTeR', 'amet']

Method 3

Disclaimer: The following solutions will work if you do not want to store the delimiter and maintaining the case of the strings in the original string is not a mandatory requirement.

Approach: Convert the entire string to lowercase and then split it with “finxter” as the delimiter/separator.

3.1 Using lower()

Solution

text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet"
text_1 = text.lower()
print(text_1.split('finxter'))
# ['lorem', 'ipsum', 'dolorsit', 'amet']

3.2 Using casefold()

Solution

text = "LoremFinxteripsumfinxterdolorsitfInxTeRamet"
text_1 = text.casefold()
print(text_1.split('finxter'))
# ['lorem', 'ipsum', 'dolorsit', 'amet']

Conclusion

I hope the solutions in this article have helped you. Please subscribe and stay tuned for more interesting reads in the future. Happy coding!

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.


Python Regex Course

Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.  

Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages

Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions.  Regular expressions ​rule the game ​when text processing ​meets computer science. 

If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet: