Python Startswith

You Cannot Use Python Regex in startswith(). Do This Instead.

I’m sitting in front of my computer refactoring Python code and have just thought of the following question:

Can You Use a Regular Expression with the Python startswith() Method?

The short answer is no. Instead, you should use the match(regex, string) function from the re module.

In fact, I realized that using a regex with the startswith() method doesn’t make sense. Why? If you want to use regular expressions, use the re module. That’s what they were created for! Regular expressions are infinitely more powerful than the startswith() method!

For example, to check whether a string starts with 'hello', you’d use the regex 'hello.*'. Now you don’t need the startswith() method anymore because the regex already takes care of that.

If you already learned something from this tutorial, why not joining my free Python training program? I call it the Finxter Email Computer Science Academy—and it’s just that: a free, easy-to-use email academy that teaches you Python in small daily doses for beginners and pros alike!

Learn Python and join the Free Finxter Email Computer Science Academy. It’s fun!

How Does the Python startswith() Method Work?

Here’s an overview of the string.startswith() method:

str.startswith(prefix[, start[, end]])
prefixrequiredString value to be searched at the beginning of string str.
startoptionalIndex of the first position where prefix is to be checked. Default: start=0.
endoptional Index of the last position where prefix is to be checked. Default: end=len(str)-1.

Let’s look at some examples using the Python startswith() method. In each one, I will modify the code to show different use cases. Let’s start with the most basic scenario. 

Related article: Python Regex Superpower – The Ultimate Guide

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Python startswith() — Most Basic Example

Suppose you have a list of strings where each string is a tweet. 

tweets = ["to thine own self be true",
          "coffee break python",
          "i like coffee"]

Let’s say you work in the coffee industry and you want to get all tweets that start with the string "coffee". We’ll use the startswith() method with a single argument:

>>> for tweet in tweets:
...   if tweet.startswith("coffee"):
...       print(tweet)
coffee break python

There is only one tweet in our dataset that starts with the string "coffee". So that is the only one printed out. 

Python startswith() — Optional Arguments

The startswith() method has two optional arguments: start and end. You can use these to define a range of indices to check. By default startswith checks the entire string. Let’s look at some examples.

The start argument tells startswith() where to begin searching. The default value is 0 i.e. it begins at the start of the string. So, the following code outputs the same result as above:

>>> for tweet in tweets:
...   if tweet.startswith("coffee", 0):
...       print(tweet)
coffee break python

What happens if we set start=7? 

>>> for tweet in tweets:
...   if tweet.startswith("coffee", 7):
...       print(tweet)
i like coffee

Why does it print 'i like coffee'? By calling the find() method, we see that the substring 'coffee' begins at index 7.

>>> 'i like coffee'.find('coffee')
7

Hence, when checking tweet.startswith("coffee", 7) for the tweet 'i like coffee', the result is True.

Let’s add another argument – the end index – to the last snippet:

>>> for tweet in tweets:
...   if tweet.startswith("coffee", 7, 9):
...       print(tweet)

Nothing is printed to the console. This is because we are only searching over 2 characters – beginning from index 7 (inclusive) and ending at index 9 (exclusive). But we are searching for ‘coffee’ and it is 6 characters long. As 6 > 2, startswith() doesn’t find any matches and so returns nothing. 

Now that you know everything about Python’s startswith method, let’s go back to our original question:

Can You Use a Regular Expression with the Python startswith() Method?

No. The startswith method does not allow for a regular expressions. You can only search for a string. 

A regular expression can describe an infinite set of matching strings. For example, 'A*' matches all words starting with 'A'. This can be computationally expensive. So, for performance reasons, it makes sense that startswith() doesn’t accept regular expressions. 

But is it also true that startswith only accepts a single string as argument? Not at all. It is possible to do the following:

Python startswith() Tuple – Check For Multiple Strings

>>> for tweet in tweets:
...   if tweet.startswith(("coffee", "i")):
...       print(tweet)
coffee break python
i like coffee

This snippet prints all strings that start with either "coffee" or "i". It is pretty efficient too. Unfortunately, you can only check a finite set of arguments. If you need to check an infinite set, you cannot use this method.

What Happens If I Pass A Regular Expression To startswith()?

Let’s check whether a tweet starts with any version of the "coffee" string. In other words, we want to apply the regex "coff*" so that we match strings like "coffee", "coffees" and "coffe".

>>> tweets = ["to thine own self be true",
                "coffee break python",
                "coffees are awesome",
                "coffe is cool"]

>>> for tweet in tweets:
        if tweet.startswith("coff*"):
            print(tweet)
# No output :(

This doesn’t work. In regular expressions, * is a wildcard and represents any character. But in the startswith() method, it just means the star character '*'. Since none of the tweets start with the literal string 'coff*', Python prints nothing to the screen.

So you might ask:

What Are The Alternatives to Using Regular Expressions in startswith()?

There is one alternative that is simple and clean: use the re module. This is Python’s built-in module built to work with regular expressions.

>>> import re
>>> tweets = ["to thine own self be true",
                "coffee break python",
                "coffees are awesome",
                "coffe is cool"]

# Success!
>>> for tweet in tweets:
        if re.match("coff*", tweet):
            print(tweet)
coffee break python
coffees are awesome
coffe is cool

Success! We’ve now printed all the tweets we expected. That is, all tweets that start with “coff” plus an arbitrary number of characters.

Note that this method is quite slow. Evaluating regular expressions is an expensive operation. But the clarity of the code has improved and we got the result we wanted. Slow and successful is better than fast and unsuccessful.

The function re.match() takes two arguments. First, the regular expression to be matched. Second, the string you want to search. If a matching substring is found, it returns True. If not, it returns False. In this case, it returns False for “to thine own self be true” and True for the rest. 

So let’s summarize the article.

Summary: Can You Use a Regular Expression with the Python startswith Method?

No, you cannot use a regular expression with the Python startswith function. But you can use the Python regular expression module re instead. It’s as simple as calling the function match(s1, s2). This finds the regular expression s1 in the string s2.

Python Startswith() List

Given that we can pass a tuple to startswith(), what happens if we pass a list?

>>> s = 'a string!'
>>> if s.startswith(['a', 'b', 'c']):
        print('yay!')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list

Python raises a TypeError. We can only pass a tuple to startswith(). So if we have a list of prefixes we want to check, we can call tuple() before passing it to startswith.

>>> if s.startswith(tuple(['a', 'b', 'c'])):
        print('yay!')
yay!

This works well and is fine performance wise. Yet, one of Python’s key features is its flexibility. So is it possible to get the same outcome without changing our list of letters to a tuple? Of course it is! 

We have two options:

  1. any + list comprehension
  2. any + map

The any() function is a way to combine logical or statements together. It takes one argument – an iterable of conditional statements. So instead of writing

if s.startswith('a') or s.startswith('b') or s.startswith('c'):
    # some code

We write

# any takes 1 argument - an iterable
if any([s.startswith('a'),
        s.startswith('b'),
        s.startswith('c')]):
    # some code

This is much nicer to read and is especially useful if you are using many mathematical statements. We can improve this by first creating a list of conditions and passing this to any().

letters = ['a', 'b', 'c']
conditions = [s.startswith(l) for l in letters]

if any(conditions):
    # do something

Alternatively, we can use map instead of a list comprehension

letters = ['a', 'b', 'c']
if any(map(s.startswith, letters)):
    # do something

Both have the same outcome. We personally prefer list comprehensions and think they are more readable. But choose whichever you prefer.  

Where to Go From Here?

Do you struggle understanding Python code? I have written a book called “Coffee Break Python: 50 Workouts to Kickstart Your Rapid Code Understanding in Python”. The book is for you if you want to become a Python pro but have little time to learn Python. A short daily coffee break Python is enough to boost your skills! Check it out!