I am sitting in front of my computer refactoring my Python code and asking myself the following question:
Is it possible to use a regular expression within the Python startswith() method?
The short answer is no. Instead, use the
match(regex, string) function from the module
In fact, I realized that using a regex does not make any sense in combination with the string.
startswith function in the first place?
TABLE OF CONTENT
- How Does the Python startswith() Method Work?
- Can I Use a Regular Expression with Python startswith()?
Although this may seem to be a trivial question, it shows that you have a need to learn the Python basics in a thorough way. You’re in luck because I offer a free Python course with lots of cheat sheets, code contests, and fun! Click the button and join tens of thousands of thriving Pythonistas:
How Does the Python startswith() Method Work?
Before we start, let’s have a short overview of the string.startswith method:
str.startswith(prefix[, start[, end]])
|required||String value to be searched at the beginning of string |
|optional||Index of the first position where |
|optional|| Index of the last position where |
Let’s dive step-by-step into the Python
startswith method. In each step, I will slightly modify the original code snippet to showcase different uses. Let’s start with the basic scenario: Suppose you have a set of tweets in form of strings.
tweets = ["to thine own self be true", "coffee break python", "i like coffee"]
Say, you work in the coffee industry. You want to filter out all tweets that start with the string
startswith method does exactly this. In its most basic form, the
startswith method takes a single argument:
for tweet in tweets: if tweet.startswith("coffee"): print(tweet)
When executing this snippet, the resulting output on your console is
"coffee break python". It’s the only tweet from our toy database that starts with the string
startswith method has two optional arguments:
end. You can use these two arguments to check whether a substring from the original string starts with your argument. Need an example that explains both arguments
beg argument sets the start index of the matching substring. Per default, it is set to 0. Hence, the following code snippet produces exactly the same result as the last code snippet:
for tweet in tweets: if tweet.startswith("coffee", 0): print(tweet)
However, what do you think happens when we set the
beg argument to the value 7? Have a look at the following code snippet:
for tweet in tweets: if tweet.startswith("coffee", 7): print(tweet)
Executing this snippet reveals that the string
'i like coffee' is printed to the standard output. The reason is that the substring starting from index 7 begins with the string
"coffee". Hence, when checking
tweet.startswith("coffee", 7) for the tweet
'i like coffee', the result is
So let’s add another argument – the end index – to the last snippet:
for tweet in tweets: if tweet.startswith("coffee", 7, 9): print(tweet)
Nothing is printed to the console. The reason is the short substring carved out from the original tweets. This substring is only two characters long – beginning from index 7 (inclusive) and ending in index 9 (exclusive). But the searched string
"coffee" is six characters long. The so parametrized startswith method returns
Now you know everything you need to know about Python’s
startswith method. So coming back to your question:
Can I Use a Regular Expression with Python startswith()?
startswith method does not allow for a regular expression. You can only search for a string prefix. This makes sense. A regular expression can describe an infinite set of matching strings. For example, the regex
'A*' matches all words starting with the character
'A'. This operation can be computationally expensive.
But is it also true that the
startswith method allows only a single string as an argument? Not at all. It is possible to do the following:
How to Check startswith() for Multiple Strings (Tuple Argument)?
for tweet in tweets: if tweet.startswith(("coffee", "i")): print(tweet)
This snippet prints all strings that start with either
"i". It is pretty efficient to do this. Unfortunately, it allows checking for only a finite set of arguments whether they are a prefix of the original string. If you need to check a more powerful expression, you can not use this way.
It would not make too much sense to use a regular expression within the startswith method.
Why? If you want to do this, you are ok with using a regular expression. Now you could simply use this regex to achieve the exact same thing you plan to do with the
startswith method. But without the unnecessary wrapper method
In the next example, let’s check whether the tweet starts with any stub of the
"coffee" string. In other words, we want to apply the regex
"coff*" so that strings like
"coffeebreakpython" evaluate to
You searched for something like this, right?
tweets =  tweets.append("to thine own self be true") tweets.append("coffee break python") tweets.append("i like coffee") tweets.append("i love coffe") # WRONG!! for tweet in tweets: if tweet.startswith("coff*"): print(tweet)
This is not working. The asterisk operator
* in the string is not interpreted as a wildcard representing any character. Instead, it means nothing but the simple star character
* in this context.
So you might ask:
What are Alternatives to Using a Regular Expression in startswith()?
There is one alternative that is incredibly simple and clean. Use the
re package. Here is an example code snippet that demonstrates the use of the
The result of this code snippet is the string
"coffee break python". It’s the only string starting with the prefix
"coff", followed by an arbitrary character combination. Although this method may be a bit slow (evaluating regular expressions is an expensive operation), the clarity of the code base has improved a lot.
Note that the regular expression operator
re.match(...) requires you to define two arguments. First, you have to define the regular expression to be matched. Second, you have to define the string in which you want to look for the regex. If a matching substring is found, it returns a Boolean value to the user. In this case, it returns
False for all strings beside
"coffee break python".
# CORRECT!! import re tweets =  tweets.append("to thine own self be true") tweets.append("coffee break python") tweets.append("i like coffee") tweets.append("i love coffe") for tweet in tweets: if re.match("coff*", tweet): print(tweet)
So let’s summarize the article.
Summary: Can I use a regular expression in the Python startswith method?
No, you cannot use a regular expression within the Python startswith function of the string class. But you can use the Python regular expression module re to do the job for you. It’s as simple as calling the function
match(s1, s2) that attempts to find the regular expression
s1 in the string
Where to go from here?
Do you struggle understanding Python code? I have written a book with the title “Coffee Break Python: 50 Workouts to Kickstart Your Rapid Code Understanding in Python”. The book is for you if you want to become a Python pro but have little time to learn Python. A short daily coffee break Python is enough to boost your skills! Check out the book!
Register for our free email course to grow your Python skills on autopilot. Why should you do that? Because I am sending you an educative Python email every few days. You will be challenged. You will grow your skills. You will become a better coder.