I am sitting in front of my computer refactoring my Python code and I am asking myself the following question.
Is it possible to use a regular expression within the Python startswith() method?
How does the Python startswith() method work?
This article leads you step-by-step through the Python startswith method. In each step, I will slightly modify the original code snippet to showcase different uses. Let’s start with the basic scenario: Suppose you have a set of tweets in form of strings.
tweets = ["to thine own self be true", "coffee break python", "i like coffee"]
Say, you work in the coffee industry. You want to filter out all tweets that start with the string ‘coffee’. The startswith() method does exactly this. In its most basic form, the startswith method takes a single argument:
for tweet in tweets: if tweet.startswith("coffee"): print(tweet)
When executing this snippet, the resulting output on your console is “coffee break python”. It’s the only tweet from our toy database that starts with the string “coffee”.
The startswith method has two optional arguments: beg and end. You can use these two arguments to check whether a substring from the original string starts with your argument. Need an example that explains both arguments beg and end?
The beg argument sets the start index of the matching substring. Per default, it is set to 0. Hence, the following code snippet produces exactly the same result as the last code snippet:
for tweet in tweets: if tweet.startswith("coffee", 0): print(tweet)
However, what do you think happens when we set the beg argument to the value 7? Have a look at the following code snippet:
for tweet in tweets: if tweet.startswith("coffee", 7): print(tweet)
Executing this snippet reveals that the string ‘i like coffee’ is printed to the standard output. The reason is that the substring starting from index 7 begins with the string “coffee”. Hence, when checking tweet.startswith(“coffee”, 7) for the tweet ‘i like coffee’, the result is True.
So let’s add another argument – the end index – to the last snippet:
for tweet in tweets: if tweet.startswith("coffee", 7, 9): print(tweet)
Nothing is printed to the console. The reason is the short substring carved out from the original tweets. This substring is only two characters long – beginning from index 7 (inclusive) and ending in index 9 (exclusive). But the searched string “coffee” is six characters long. The so parametrized startswith method returns False.
Now you know everything you need to know about Python’s startswith method. So coming back to your question:
Can I use a regular expression with Python startswith()?
No. The startswith method does not allow for a regular expression. You can only search for a string prefix. This makes sense. A regular expression can describe an infinite set of matching strings. For example, the regex ‘A*’ matches all words starting with the character ‘A’. Clearly, this operation can be computationally expensive.
But is it also true that the startswith method allows only a single string as an argument? Not at all. It is possible to do the following:
How to check startswith for multiple strings (tuple argument)?
for tweet in tweets: if tweet.startswith(("coffee", "i")): print(tweet)
This snippet prints all strings that start with either “coffee” or “i”. It is pretty efficient to do this. Unfortunately, it allows checking for only a finite set of arguments whether they are a prefix of the original string. If you need to check a more powerful expression, you can not use this way.
To be frank, it would not make too much sense to use a regular expression within the startswith method. Why? Suppose you are ok with using a regular expression. Now you could simply use this regex to achieve the exact same thing you plan to do with the startswith method. But without the unnecessary wrapper function startswith().
In the next example, let’s check whether the tweet starts with any stub of the “coffee” string. In other words, we want to apply the regex “coff*” so that strings like “coffee”, “coffees”, “coffeebreakpython” evaluate to True.
You searched for something like this, right?
tweets =  tweets.append("to thine own self be true") tweets.append("coffee break python") tweets.append("i like coffee") tweets.append("i love coffe") # WRONG!! for tweet in tweets: if tweet.startswith("coff*"): print(tweet)
This is not working. The asterisk operator ‘*’ in the string is not interpreted as a wildcard representing any character. Instead, it means nothing but the simple star character ‘*’ in this context.
So you might ask:
What are alternatives to using a regular expression in startswith()?
There is one alternative that is incredibly simple and clean. Use the re package. Here is an example code snippet that demonstrates the use of the re package.
The result of this code snippet is the string “coffee break python”. It’s the only string starting with the prefix ‘coff’, followed by an arbitrary character combination. Although this method may be a bit slow (evaluating regular expressions is an expensive operation), the clarity of the code base has improved a lot. Note that the regular expression operator re.match(…) requires you to define two arguments. First, you have to define the regular expression to be matched. Second, you have to define the string in which you want to look for the regex. If a matching substring is found, it returns a boolean value to the user. In this case, it returns False for all strings beside “coffee break python”.
# CORRECT!! import re tweets =  tweets.append("to thine own self be true") tweets.append("coffee break python") tweets.append("i like coffee") tweets.append("i love coffe") for tweet in tweets: if re.match("coff*", tweet): print(tweet)
So let’s summarize the article.
Can I use a regular expression in the Python startswith method?
No, you cannot use a regular expression within the Python startswith function. But you can use the Python regular expression module re to do the job for you. It’s as simple as calling the function match(s1, s2) that attempts to find the regular expression s1 in the string s2.
Where to go from here?
Do you struggle understanding Python code? I have written a book with the title “Coffee Break Python: 50 Workouts to Kickstart Your Rapid Code Understanding in Python”. The book is for you if you want to become a Python pro but have little time to learn Python. A short daily coffee break Python is enough to boost your skills! Check out the book!
Register for our free email course to grow your Python skills on autopilot. Why should you do that? Because I am sending you an educative Python email every few days. You will be challenged. You will grow your skills. You will become a better coder.