You’re about to learn one of the most frequently used regex operators: the dot regex . in Python’s re library.
You can also watch the walk-through video as you read through the tutorial:
Related article: Python Regex Superpower – The Ultimate Guide
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
What’s the Dot Regex in Python’s Re Library?
The dot regex .
matches all characters except the newline character. For example, the regular expression '...'
matches strings 'hey'
and 'tom'
. But it does not match the string 'yo\nto'
which contains the newline character '\n'
. Combined with the asterisk quantifier in the pattern '.*'
, the dot regex matches an arbitrary number of symbols except newline characters.
Try it yourself in our interactive code shell:
Exercise: Guess the output of this code snippet. Run the code to check if you were right! Now, try to match another word using the dot regex.
Examples Dot Regex
Let’s study some basic examples to gain a deeper understanding.
>>> import re >>> >>> text = '''But then I saw no harm, and then I heard Each syllable that breath made up between them.''' >>> re.findall('B..', text) ['But'] >>> re.findall('heard.Each', text) [] >>> re.findall('heard\nEach', text) ['heard\nEach'] >>>
You first import Python’s re library for regular expression handling. Then, you create a multi-line text using the triple string quotes.
Let’s dive into the first example:
>>> re.findall('B..', text) ['But']
You use the re.findall()
method. Here’s the definition from the Finxter blog article:
The re.findall(pattern, string)
method finds all occurrences of the pattern
in the string
and returns a list of all matching substrings.
The first argument is the regular expression pattern 'B..'
. The second argument is the string to be searched for the pattern. You want to find all patterns starting with the 'B'
character, followed by two arbitrary characters except the newline character.
The findall()
method finds only one such occurrence: the string 'But'
.
The second example shows that the dot operator does not match the newline character:
>>> re.findall('heard.Each', text) []
In this example, you’re looking at the simple pattern 'heard.Each'
. You want to find all occurrences of string 'heard'
followed by an arbitrary non-whitespace character, followed by the string 'Each'
.
But such a pattern does not exist! Many coders intuitively read the dot regex as an arbitrary character. You must be aware that the correct definition of the dot regex is an arbitrary character except the newline. This is a source of many bugs in regular expressions.
The third example shows you how to explicitly match the newline character '\n'
instead:
>>> re.findall('heard\nEach', text) ['heard\nEach']
Now, the regex engine matches the substring.
Naturally, the following relevant question arises:
How to Match an Arbitrary Character (Including Newline)?
The dot regex .
matches a single arbitrary character—except the newline character. But what if you do want to match the newline character, too? There are two main ways to accomplish this.
- Use the
re.DOTALL
flag. - Use a character class
[.\n]
.
Here’s the concrete example showing both cases:
>>> import re >>> >>> s = '''hello python''' >>> re.findall('o.p', s) [] >>> re.findall('o.p', s, flags=re.DOTALL) ['o\np'] >>> re.findall('o[.\n]p', s) ['o\np']
You create a multi-line string. Then you try to find the regex pattern 'o.p'
in the string. But there’s no match because the dot operator does not match the newline character per default. However, if you define the flag re.DOTALL
, the newline character will also be a valid match.
Learn more about the different flags in my Finxter blog tutorial.
An alternative is to use the slightly more complicated regex pattern [.\n]
. The square brackets enclose a character class—a set of characters that are all a valid match. Think of a character class as an OR operation: exactly one character must match.
How to Match the Dot Character (Without Special Meaning)?
If you use the character '.'
in a regular expression, Python assumes that it’s the dot operator you’re talking about. But what if you actually want to match a dot—for example to match the period at the end of a sentence?
Nothing simpler than that: escape the dot regex by using the backslash: '\.'
. The backslash nullifies the meaning of the special symbol '.'
in the regex. The regex engine now knows that you’re actually looking for the dot character, not an arbitrary character except newline.
Here’s an example:
>>> import re >>> text = 'Python. Is. Great. Period.' >>> re.findall('\.', text) ['.', '.', '.', '.']
The findall()
method returns all four periods in the sentence as matching substrings for the regex '\.'
.
In this example, you’ll learn how you can combine it with other regular expressions:
>>> re.findall('\.\s', text) ['. ', '. ', '. ']
Now, you’re looking for a period character followed by an arbitrary whitespace. There are only three such matching substrings in the text.
In the next example, you learn how to combine this with a character class:
>>> re.findall('[st]\.', text) ['s.', 't.']
You want to find either character 's'
or character 't'
followed by the period character '.'
. Two substrings match this regex.
Note that skipping the backslash is required. If you forget this, it can lead to strange behavior:
>>> re.findall('[st].', text) ['th', 's.', 't.']
As an arbitrary character is allowed after the character class, the substring 'th'
also matches the regex.
[Collection] What Are The Different Python Re Quantifiers?
If you want to use (and understand) regular expressions in practice, you’ll need to know the most important quantifiers that can be applied to any regex (including the dot regex)!
So let’s dive into the other regexes:
Quantifier | Description | Example |
. | The wild-card (‘dot’) matches any character in a string except the newline character '\n' . | Regex '...' matches all words with three characters such as 'abc' , 'cat' , and 'dog' . |
* | The zero-or-more asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex. | Regex 'cat*' matches the strings 'ca' , 'cat' , 'catt' , 'cattt' , and 'catttttttt' . — |
? | The zero-or-one matches (as the name suggests) either zero or one occurrences of the immediately preceding regex. | Regex ‘cat?’ matches both strings 'ca' and 'cat' — but not 'catt' , 'cattt' , and 'catttttttt' . |
+ | The at-least-one matches one or more occurrences of the immediately preceding regex. | Regex 'cat+' does not match the string 'ca' but matches all strings with at least one trailing character 't' such as 'cat' , 'catt' , and 'cattt' . |
^ | The start-of-string matches the beginning of a string. | Regex '^p' matches the strings 'python' and 'programming' but not 'lisp' and 'spying' where the character 'p' does not occur at the start of the string. |
$ | The end-of-string matches the end of a string. | Regex 'py$' would match the strings and but not the strings 'python' and 'pypi' . |
A|B | The OR matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions. | Regex matches strings 'hello world' and 'hi python' . It wouldn’t make sense to try to match both of them at the same time. |
AB | The AND matches first the regex A and second the regex B, in this sequence. | We’ve already seen it trivially in the regex 'ca' that matches first regex 'c' and second regex 'a' . |
Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’
operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.
We’ve already seen many examples but let’s dive into even more!
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.*and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.*', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('n$', text)) ''' Finds all occurrences where the new-line character 'n' occurs at the end of the string. ['n'] ''' print(re.findall('(Life|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] '''
In these examples, you’ve already seen the special symbol ‘\n’
which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions.
Related Re Methods
There are five important regular expression methods which you should master:
- The
re.findall(pattern, string)
method returns a list of string matches. Read more in our blog tutorial. - The
re.search(pattern, string)
method returns a match object of the first match. Read more in our blog tutorial. - The
re.match(pattern, string)
method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial. - The
re.fullmatch(pattern, string)
method returns a match object if the regex matches the whole string. Read more in our blog tutorial. - The
re.compile(pattern)
method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial. - The
re.split(pattern, string)
method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial. - The
re.sub(pattern, repl, string, count=0, flags=0)
method returns a new string where all occurrences of the pattern in the old string are replaced byrepl
. Read more in our blog tutorial.
These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned everything you need to know about the dot regex .
in this regex tutorial.
Summary: The dot regex .
matches all characters except the newline character. For example, the regular expression '...'
matches strings 'hey'
and 'tom'
. But it does not match the string ‘yo\nto’ which contains the newline character '\n'
.
Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?
Join the free webinar that shows you how to become a thriving coding business owner online!
[Webinar] Become a Six-Figure Freelance Developer with Python
Join us. It’s fun! 🙂