I’m sitting in front of my computer refactoring Python code and have just thought of the following question:
Can You Use a Regular Expression with the Python
The short answer is no. The
string.startswith() method doesn’t allow regular expression inputs. And you don’t need it because regular expressions can already check if a string starts with a pattern using the
re.match(pattern, string) function from the
In fact, shortly after asking the question, I realized that using a regex with the
startswith() method doesn’t make sense. Why? If you want to use regular expressions, use the
re module. Regular expressions are infinitely more powerful than the
For example, to check whether a string starts with
'hello', you’d use the regex
'hello.*'. Now you don’t need the
startswith() method anymore because the regex already takes care of that.
If you already learned something from this tutorial, why not joining my free Python training program? I call it the Finxter Email Computer Science Academy—and it’s just that: a free, easy-to-use email academy that teaches you Python in small daily doses for beginners and pros alike!
How Does the Python startswith() Method Work?
Here’s an overview of the
str.startswith(prefix[, start[, end]])
|required||String value to be searched at the beginning of string |
|optional||Index of the first position where |
|optional|| Index of the last position where |
Let’s look at some examples using the Python
startswith() method. In each one, I will modify the code to show different use cases. Let’s start with the most basic scenario.
Related article: Python Regex Superpower – The Ultimate Guide
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Python startswith() — Most Basic Example
Suppose you have a list of strings where each string is a tweet.
tweets = ["to thine own self be true", "coffee break python", "i like coffee"]
Let’s say you work in the coffee industry and you want to get all tweets that start with the string
"coffee". We’ll use the
startswith() method with a single argument:
>>> for tweet in tweets: ... if tweet.startswith("coffee"): ... print(tweet) coffee break python
There is only one tweet in our dataset that starts with the string
"coffee". So that is the only one printed out.
Python startswith() — Optional Arguments
The startswith() method has two optional arguments:
end. You can use these to define a range of indices to check. By default startswith checks the entire string. Let’s look at some examples.
The start argument tells
startswith() where to begin searching. The default value is 0 i.e. it begins at the start of the string. So, the following code outputs the same result as above:
>>> for tweet in tweets: ... if tweet.startswith("coffee", 0): ... print(tweet) coffee break python
What happens if we set start=7?
>>> for tweet in tweets: ... if tweet.startswith("coffee", 7): ... print(tweet) i like coffee
Why does it print
'i like coffee'? By calling the find() method, we see that the substring
'coffee' begins at index 7.
>>> 'i like coffee'.find('coffee') 7
Hence, when checking
tweet.startswith("coffee", 7) for the tweet
'i like coffee', the result is
Let’s add another argument – the end index – to the last snippet:
>>> for tweet in tweets: ... if tweet.startswith("coffee", 7, 9): ... print(tweet)
Nothing is printed to the console. This is because we are only searching over 2 characters – beginning from index 7 (inclusive) and ending at index 9 (exclusive). But we are searching for ‘coffee’ and it is 6 characters long. As 6 > 2,
startswith() doesn’t find any matches and so returns nothing.
Now that you know everything about Python’s startswith method, let’s go back to our original question:
Can You Use a Regular Expression with the Python startswith() Method?
No. The startswith method does not allow for a regular expressions. You can only search for a string.
A regular expression can describe an infinite set of matching strings. For example,
'A*' matches all words starting with
'A'. This can be computationally expensive. So, for performance reasons, it makes sense that
startswith() doesn’t accept regular expressions.
Instead, you can use the
re.match(pattern, string) method returns a match object if the
pattern matches at the beginning of the
string. The match object contains useful information such as the matching groups and the matching positions. An optional argument
flags allows you to customize the regex engine, for example to ignore capitalization.
Specification: re.match(pattern, string, flags=0)
re.match() method has up to three arguments.
pattern: the regular expression pattern that you want to match.
string: the string which you want to search for the pattern.
flags(optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.
re.match() method returns a match object. You can learn everything about match objects and the re.match() method in my detailed blog guide:
Here’s the video in case you’re more the multimodal learner:
But is it also true that
startswith only accepts a single string as argument? Not at all. It is possible to do the following:
Python startswith() Tuple – Check For Multiple Strings
>>> for tweet in tweets: ... if tweet.startswith(("coffee", "i")): ... print(tweet) coffee break python i like coffee
This snippet prints all strings that start with either
"i". It is pretty efficient too. Unfortunately, you can only check a finite set of arguments. If you need to check an infinite set, you cannot use this method.
What Happens If I Pass A Regular Expression To startswith()?
Let’s check whether a tweet starts with any version of the
"coffee" string. In other words, we want to apply the regex
"coff*" so that we match strings like
>>> tweets = ["to thine own self be true", "coffee break python", "coffees are awesome", "coffe is cool"] >>> for tweet in tweets: if tweet.startswith("coff*"): print(tweet) # No output :(
This doesn’t work. In regular expressions,
* is a wildcard and represents any character. But in the startswith() method, it just means the star character
'*'. Since none of the tweets start with the literal string
'coff*', Python prints nothing to the screen.
So you might ask:
What Are The Alternatives to Using Regular Expressions in startswith()?
There is one alternative that is simple and clean: use the re module. This is Python’s built-in module built to work with regular expressions.
>>> import re >>> tweets = ["to thine own self be true", "coffee break python", "coffees are awesome", "coffe is cool"] # Success! >>> for tweet in tweets: if re.match("coff*", tweet): print(tweet) coffee break python coffees are awesome coffe is cool
Success! We’ve now printed all the tweets we expected. That is, all tweets that start with “coff” plus an arbitrary number of characters.
Note that this method is quite slow. Evaluating regular expressions is an expensive operation. But the clarity of the code has improved and we got the result we wanted. Slow and successful is better than fast and unsuccessful.
re.match() takes two arguments. First, the regular expression to be matched. Second, the string you want to search. If a matching substring is found, it returns True. If not, it returns False. In this case, it returns False for “to thine own self be true” and True for the rest.
So let’s summarize the article.
Summary: Can You Use a Regular Expression with the Python startswith Method?
No, you cannot use a regular expression with the Python
startswith function. But you can use the Python regular expression module
re instead. It’s as simple as calling the function
re.match(s1, s2). This finds the regular expression
s1 in the string
Python Startswith() List
Given that we can pass a tuple to
startswith(), what happens if we pass a list?
>>> s = 'a string!' >>> if s.startswith(['a', 'b', 'c']): print('yay!') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str or a tuple of str, not list
Python raises a TypeError. We can only pass a tuple to startswith(). So if we have a list of prefixes we want to check, we can call tuple() before passing it to startswith.
>>> if s.startswith(tuple(['a', 'b', 'c'])): print('yay!') yay!
This works well and is fine performance wise. Yet, one of Python’s key features is its flexibility. So is it possible to get the same outcome without changing our list of letters to a tuple? Of course it is!
We have two options:
- any + list comprehension
- any + map
The any() function is a way to combine logical or statements together. It takes one argument – an iterable of conditional statements. So instead of writing
if s.startswith('a') or s.startswith('b') or s.startswith('c'): # some code
# any takes 1 argument - an iterable if any([s.startswith('a'), s.startswith('b'), s.startswith('c')]): # some code
This is much nicer to read and is especially useful if you are using many mathematical statements. We can improve this by first creating a list of conditions and passing this to any().
letters = ['a', 'b', 'c'] conditions = [s.startswith(l) for l in letters] if any(conditions): # do something
Alternatively, we can use map instead of a list comprehension
letters = ['a', 'b', 'c'] if any(map(s.startswith, letters)): # do something
Both have the same outcome. We personally prefer list comprehensions and think they are more readable. But choose whichever you prefer.
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet:
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.