When I first learned about regular expressions, I didn’t appreciate their power. But there’s a reason regular expressions have survived seven decades of technological disruption: coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the
search() method of Python’s re library. To learn about the easy-to-use but less powerful
findall() method that returns a list of string matches, check out our article about the similar
Related article: Python Regex Superpower – The Ultimate Guide
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
So how does the
re.search() method work? Let’s study the specification.
How Does re.search() Work in Python?
re.search(pattern, string) method matches the first occurrence of the
pattern in the
string and returns a match object.
re.search(pattern, string, flags=0)
re.search() method has up to three arguments.
pattern: the regular expression pattern that you want to match.
string: the string which you want to search for the pattern.
flags(optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.
We’ll explore them in more detail later.
re.search() method returns a match object. You may ask (and rightly so):
What’s a Match Object?
If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?
The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as
search()—automatically create a match object upon the first pattern match.
At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods
m.end() on the match object
>>> m = re.search('h...o', 'hello world') >>> m.start() 0 >>> m.end() 5 >>> 'hello world'[m.start():m.end()] 'hello'
In the first line, you create a match object m by using the
re.search() method. The pattern
'h...o' matches in the string
'hello world' at start position 0. You use the start and end position to access the substring that matches the pattern (using the popular Python technique of slicing).
Now, you know the purpose of the
match() object in Python. Let’s check out a few examples of
A Guided Example for re.search()
First, you import the re module and create the text string to be searched for the regex patterns:
>>> import re >>> text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''
Let’s say you want to search the text for the string ‘her’:
>>> re.search('her', text) <re.Match object; span=(20, 23), match='her'>
The first argument is the pattern to be found. In our case, it’s the string
'her'. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. You don’t need to define the optional third argument
flags of the
search() method because you’re fine with the default behavior in this case.
Look at the output: it’s a match object! The match object gives the span of the match—that is the start and stop indices of the match. We can also directly access those boundaries by using the
stop() methods of the match object:
>>> m = re.search('her', text) >>> m.start() 20 >>> m.end() 23
The problem is that the
search() method only retrieves the first occurrence of the pattern in the string. If you want to find all matches in the string, you may want to use the
findall() method of the re library.
What’s the Difference Between re.search() and re.findall()?
There are two differences between the
re.search(pattern, string) and
re.findall(pattern, string) methods:
re.search(pattern, string)returns a match object while
re.findall(pattern, string)returns a list of matching strings.
re.search(pattern, string)returns only the first match in the string while
re.findall(pattern, string)returns all matches in the string.
Both can be seen in the following example:
>>> text = 'Python is superior to Python' >>> re.search('Py...n', text) <re.Match object; span=(0, 6), match='Python'> >>> re.findall('Py...n', text) ['Python', 'Python']
'Python is superior to Python' contains two occurrences of
search() method only returns a match object of the first occurrence. The
findall() method returns a list of all occurrences.
What’s the Difference Between re.search() and re.match()?
re.search(pattern, string) and
re.match(pattern, string) both return a match object of the first match. However,
re.match() attempts to match at the beginning of the string while
re.search() matches anywhere in the string.
You can see this difference in the following code:
>>> text = 'Slim Shady is my name' >>> re.search('Shady', text) <re.Match object; span=(5, 10), match='Shady'> >>> re.match('Shady', text) >>>
re.search() method retrieves the match of the
'Shady' substring as a match object. But if you use the
re.match() method, there is no match and no return value because the substring
'Shady' does not occur at the beginning of the string
'Slim Shady is my name'.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the
search() method comes with an optional third
re.search(pattern, string, flags=0)
What’s the purpose of the flags argument?
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
|re.ASCII||If you don’t use this flag, the special Python regex symbols \w, \W, \b, \B, \d, \D, \s and \S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.|
|re.A||Same as re.ASCII|
|re.DEBUG||If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.|
|re.IGNORECASE||If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for |
|re.I||Same as re.IGNORECASE|
|re.LOCALE||Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.|
|re.L||Same as re.LOCALE|
|re.MULTILINE||This flag switches on the following feature: the start-of-the-string regex |
|re.M||Same as re.MULTILINE|
|re.DOTALL||Without using this flag, the dot regex |
|re.S||Same as re.DOTALL|
|re.VERBOSE||To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character |
|re.X||Same as re.VERBOSE|
Here’s how you’d use it in a practical example:
>>> text = 'Python is great!' >>> re.search('PYTHON', text, flags=re.IGNORECASE) <re.Match object; span=(0, 6), match='Python'>
Although your regex
'PYTHON' is all-caps, we ignore the capitalization by using the flag
This article has introduced the
re.search(pattern, string) method that attempts to match the first occurrence of the regex pattern in a given string—and returns a match object.
Python soars in popularity. There are two types of people: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills daily without investing a lot of time?
Where to Go From Here?
Enough theory, let’s get some practice!
To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
Practice projects is how you sharpen your saw in coding!
Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?
Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.