Python Regex Or - A Simple Illustrated Guide - Be on the Right Side of Change

This tutorial is all about the or | operator of Python’s re library. You can also play the tutorial video while you read:

Related article: Python Regex Superpower – The Ultimate Guide

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

What’s the Python Regex Or | Operator?

Given a string. Say, your goal is to find all substrings that match either the string 'iPhone' or the string 'iPad'. How can you achieve this?

The easiest way to achieve this is the Python or operator | using the regular expression pattern (iPhone|iPad).

Here’s an example:

>>> import re
>>> text = 'Buy now: iPhone only $399 with free iPad'
>>> re.findall('(iPhone|iPad)', text)
['iPhone', 'iPad']

You have the (salesy) text that contains both strings 'iPhone' and 'iPad'.

You use the re.findall() method. In case you don’t know it, here’s the definition from the Finxter blog article:

The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings.

Please consult the blog article to learn everything you need to know about this fundamental Python method.

The first argument is the pattern (iPhone|iPad). It either matches the first part right in front of the or symbol |—which is iPhone—or the second part after it—which is iPad.

The second argument is the text 'Buy now: iPhone only $399 with free iPad' which you want to search for the pattern.

The result shows that there are two matching substrings in the text: 'iPhone' and 'iPad'.

Python Regex Or: Examples

Let’s study some more examples to teach you all the possible uses and border cases—one after another.

You start with the previous example:

>>> import re
>>> text = 'Buy now: iPhone only $399 with free iPad'
>>> re.findall('(iPhone|iPad)', text)
['iPhone', 'iPad']

What happens if you don’t use the parenthesis?

>>> text = 'iPhone iPhone iPhone iPadiPad'
>>> re.findall('(iPhone|iPad)', text)
['iPhone', 'iPhone', 'iPhone', 'iPad', 'iPad']
>>> re.findall('iPhone|iPad', text)
['iPhone', 'iPhone', 'iPhone', 'iPad', 'iPad']

In the second example, you just skipped the parentheses using the regex pattern iPhone|iPad rather than (iPhone|iPad). But no problem–it still works and generates the exact same output!

But what happens if you leave one side of the or operation empty?

>>> re.findall('iPhone|', text)
['iPhone', '', 'iPhone', '', 'iPhone', '', '', '', '', '', '', '', '', '', '']

The output is not as strange as it seems. The or operator allows for empty operands—in which case it wants to match the non-empty string. If this is not possible, it matches the empty string (so everything will be a match).

The previous example also shows that it still tries to match the non-empty string if possible. But what if the trivial empty match is on the left side of the or operand?

>>> re.findall('|iPhone', text)
['', 'iPhone', '', '', 'iPhone', '', '', 'iPhone', '', '', '', '', '', '', '', '', '', '']

This shows some subtleties of the regex engine. First of all, it still matches the non-empty string if possible! But more importantly, you can see that the regex engine matches from left to right. It first tries to match the left regex (which it does on every single position in the text). An empty string that’s already matched will not be considered anymore. Only then, it tries to match the regex on the right side of the or operator.

Think of it this way: the regex engine moves from the left to the right—one position at a time. It matches the empty string every single time. Then it moves over the empty string and in some cases, it can still match the non-empty string. Each match “consumes” a substring and cannot be matched anymore. But an empty string cannot be consumed. That’s why you see the first match is the empty string and the second match is the substring 'iPhone'.

How to Nest the Python Regex Or Operator?

Okay, you’re not easily satisfied, are you? Let’s try nesting the Python regex or operator |.

>>> text = 'xxx iii zzz iii ii xxx'
>>> re.findall('xxx|iii|zzz', text)
['xxx', 'iii', 'zzz', 'iii', 'xxx']

So you can use multiple or operators in a row. Of course, you can also use the grouping (parentheses) operator to nest an arbitrary complicated construct of or operations:

>>> re.findall('x(i|(zz|ii|(x| )))', text)
[('x', 'x', 'x'), (' ', ' ', ' '), ('x', 'x', 'x')]

But this seldomly leads to clean and readable code. And it can usually avoided easily by putting a bit of thought into your regex design.

Python Regex Or: Character Class

If you only want to match a single character out of a set of characters, the character class is a much better way of doing it:

>>> import re
>>> text = 'hello world'
>>> re.findall('[abcdefghijklmnopqrstuvwxyz]+', text)
['hello', 'world']

A shorter and more concise version would be to use the range operator within character classes:

>>> re.findall('[a-z]+', text)
['hello', 'world']

The character class is enclosed in the bracket notation [ ] and it literally means “match exactly one of the symbols in the class”. Thus, it carries the same semantics as the or operator: |. However, if you try to do something on those lines…

>>> re.findall('(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)+', text)
['o', 'd']

… you’ll first write much less concise code and, second, risk of getting confused by the output. The reason is that the parenthesis is the group operator—it captures the position and substring that matches the regex. Used in the findall() method, it only returns the content of the last matched group. This turns out to be the last character of the word 'hello' and the last character of the word 'world'.

How to Match the Or Character (Vertical Line ‘|’)?

So if the character '|' stands for the or character in a given regex, the question arises how to match the vertical line symbol '|' itself?

The answer is simple: escape the or character in your regular expression using the backslash. In particular, use 'A\|B' instead of 'A|B' to match the string 'A|B' itself. Here’s an example:

>>> import re
>>> re.findall('A|B', 'AAAA|BBBB')
['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
>>> re.findall('A\|B', 'AAAA|BBBB')
['A|B']

Do you really understand the outputs of this code snippet? In the first example, you’re searching for either character 'A' or character 'B'. In the second example, you’re searching for the string 'A|B' (which contains the '|' character).

Python Regex And

If there’s a Python regex “or”, there must also be an “and” operator, right?

Correct! But think about it for a moment: say, you want one regex to occur alongside another regex. In other words, you want to match regex A and regex B. So what do you do? You match regex AB.

You’ve already seen many examples of the “Python regex AND” operator—but here’s another one:

>>> import re
>>> re.findall('AB', 'AAAACAACAABAAAABAAC')
['AB', 'AB']

The simple concatenation of regex A and B already performs an implicit “and operation”.

👉 Recommended Tutorial: Python Regex AND (yes, it does exist)

Python Regex Not

How can you search a string for substrings that do NOT match a given pattern? In other words, what’s the “negative pattern” in Python regular expressions?

The answer is two-fold:

If you want to match all characters except a set of specific characters, you can use the negative character class [^...].
If you want to match all substrings except the ones that match a regex pattern, you can use the feature of negative lookahead (?!...).

Here’s an example for the negative character class:

>>> import re
>>> re.findall('[^a-m]', 'aaabbbaababmmmnoopmmaa')
['n', 'o', 'o', 'p']

And here’s an example for the negative lookahead pattern to match all “words that are not followed by words”:

>>> re.findall('[a-z]+(?![a-z]+)', 'hello world')
['hello', 'world']

The negative lookahead (?![a-z]+) doesn’t consume (match) any character. It just checks whether the pattern [a-z]+ does NOT match at a given position. The only times this happens is just before the empty space and the end of the string.

[Collection] What Are The Different Python Re Quantifiers?

The “and”, “or”, and “not” operators are not the only regular expression operators you need to understand. So what are other operators?

Next, you’ll get a quick and dirty overview of the most important regex operations and how to use them in Python. Here are the most important regex quantifiers:

Quantifier	Description	Example
`.`	The wild-card (‘dot’) matches any character in a string except the newline character ‘n’.	Regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.
`*`	The zero-or-more asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex.	Regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.
`?`	The zero-or-one matches (as the name suggests) either zero or one occurrences of the immediately preceding regex.	Regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.
`+`	The at-least-one matches one or more occurrences of the immediately preceding regex.	Regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.
`^`	The start-of-string matches the beginning of a string.	Regex ‘^p’ matches the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.
`$`	The end-of-string matches the end of a string.	Regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.
`A\|B`	The OR matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions.	Regex ‘(hello)\|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.
`AB`	The AND matches first the regex A and second the regex B, in this sequence.	We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.

Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.

We’ve already seen many examples but let’s dive into even more!

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('n$', text))
'''
Finds all occurrences where the new-line character 'n'
occurs at the end of the string.
['n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

In these examples, you’ve already seen the special symbol ‘\n’ which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions. Next, we’ll discover the most important special symbols.

Related Re Methods

There are seven important regular expression methods which you must master:

The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
The re.sub(The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in our blog tutorial.

These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.

Where to Go From Here?

You’ve learned everything you need to know about the Python Regex Or Operator.

Summary:

Given a string. Say, your goal is to find all substrings that match either the string 'iPhone' or the string 'iPad'. How can you achieve this?

The easiest way to achieve this is the Python or operator | using the regular expression pattern (iPhone|iPad).

Regex Humor

*Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee.* (source)

Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?

Join the free webinar that shows you how to become a thriving coding business owner online!

[Webinar] Become a Six-Figure Freelance Developer with Python

Join us. It’s fun! 🙂