Python Regex Lookahead and Lookbehind — A Helpful Guide

Python Lookahead

You might find yourself working with regular expressions in Python, and the lookahead feature is quite handy in many situations.

Lookahead comes in two flavors: positive lookahead and negative lookahead. Let’s explore these types of lookahead and see how you can use them in your Python code! 🐍

Positive lookahead is an assertion that checks whether a given pattern is followed by another pattern. In Python, this is done using the (?=) syntax. If it’s successful, the regex engine continues matching, but if not, it continues searching from where it left off.

Consider this example:

import re

text = "I love coding in Python!"
pattern = r"\b\w+(?=ing)"
result = re.findall(pattern, text)
print(result)  # Output: ['cod']

Here, we search for a word ending in "ing" but only return the word without the "ing" part. The positive lookahead makes sure that the word is followed by "ing", but it doesn’t include those characters in the actual match. πŸ˜„

Now let’s look at negative lookahead, which asserts that a given pattern is not followed by another pattern. You use the (?!pattern) syntax in Python for this purpose.

Take a look at this example to see how it works:

pattern = r"\b\w+\b(?!ing)"
result = re.findall(pattern, text)
print(result)  # Output: ['I', 'love', 'in', 'Python!']

In this case, we find all words not followed by "ing". Notice that the "cod" part of "coding" is missing, as the negative lookahead assertion checks for the absence of "ing" after the word.πŸ”Ž

You can also use multiple lookaheads, both positive or negative, to create more complex patterns:

pattern = r"\b\w+(?=ing)(?!Python)"
result = re.findall(pattern, text)
print(result)  # Output: []

Here, we use a positive lookahead to find words that end in "ing" and a negative lookahead to exclude matches with "Python". In this example, nothing matches the criteria, so the result is an empty list. πŸ“–

πŸ’‘ Remember that lookahead assertions do not consume any characters in the matching process, making them zero-width assertions. This allows you to perform advanced searches in your strings without affecting the matched content.

Figure: A simple example of lookahead. The regular expression engine matches (“consumes”) the string partially. Then it checks whether the remaining pattern could be matched without actually matching it.


Lookahead assertions in Python regexes help you match string patterns that are either followed by (positive lookahead) or not followed by (negative lookahead) a specified pattern. πŸš€

Python Lookahead Example 🐍

Let’s dive into the positive lookahead syntax: (?=Y). This syntax means you want to search for a pattern but matches only if it is followed by another pattern.

Consider the following code snippet that demonstrates this concept in action:

import re

string = "I love Python3 and Python2"
pattern = r'Python(?=\d)'

matches = re.findall(pattern, string)

print(matches)  
# Output: ['Python', 'Python']

In the example above, you’re searching for the word "Python" only if it is followed by a digit. The positive lookahead doesn’t include the digit in the match, so the output is a list containing just the word "Python" twice. 🧠

On the other hand, negative lookahead uses the syntax (?!Y) and matches a pattern only if it is NOT followed by another pattern.

Let’s take a look at an example:

pattern = r'Python(?!3)'

matches = re.findall(pattern, string)
print(matches)  # Output: ['Python']

In this example, you’re searching for the word "Python" not followed by the digit '3'. The output contains just one match, which is the "Python" followed by the digit '2'. 🎯

Multiple Lookahead – Positive and Negative

At this point, you already know that lookaheads in regex are a powerful way to ensure a certain pattern is immediately followed by another pattern. Positive lookaheads make sure the pattern is there, while negative lookaheads ensure the pattern is not there.

Let’s dive into using multiple lookaheads in Python.

Multiple Positive Lookaheads

Suppose you want to find some text in a string that’s followed by multiple specific patterns. You can combine multiple positive lookaheads like this:

X(?=Y)(?=Z)

For example, let’s say you’re searching for a word followed by a digit and then by a capital letter.

Your regex could look like this:

import re

pattern = r'\b\w+(?=\d)(?=[A-Z])'
string = "Python3A Tutorial4B"

matches = re.findall(pattern, string)

Here, the regex \b\w+(?=\d)(?=[A-Z]) searches for words followed by a digit and then by a capital letter. In this case, there are no matches because no words meet these conditions.

Multiple Negative Lookaheads

Similarly, you can use multiple negative lookaheads to ensure that your pattern isn’t followed by certain patterns. The syntax is similar to positive lookaheads:

X(?!Y)(?!Z)

For instance, let’s find a word that is not followed by a digit and also not followed by a capital letter:

import re

pattern2 = r'\b\w+(?!\d)(?![A-Z])'
string = "Python3A Tutorial4B"

matches2 = re.findall(pattern2, string)

In this case, the regex \b\w+(?!\d)(?![A-Z]) searches for words not followed by a digit and not followed by a capital letter. The result will be ['Python', 'Tutorial'].

By the way, you may be interested in this tutorial on regex groups and lookahead:

Common Issues with Lookahead

Python Positive Lookahead Not Working

πŸ€” So you’re trying to use a positive lookahead in your Python regex, but it’s not working as expected? Don’t worry! Most of the time, this can be resolved by double-checking your regex pattern.

Make sure the syntax is correct: (?=lookahead_regex) (source).

Also, be aware that lookahead assertions are zero-width, meaning they do not consume input characters. So double-check your use of group assertions and quantifiers!

# Proper usage of positive lookahead
pattern = r"abc(?=\d)"

πŸ’‘ Remember, positive lookahead checks if the string immediately following the current position matches a specific pattern.

Python Negative Lookahead Not Working

🀨 Having trouble with negative lookahead in Python regex?

Once again, the key is ensuring that your regex pattern follows the proper syntax: (?!lookahead_regex) (source).

Negative lookahead means the regex engine will exclude matching patterns if they immediately follow the current position.

# Right way to use negative lookahead
pattern = r"abc(?!def)"

🧐 Note that negative lookahead assertions are also zero-width. Common mistakes can include improper use of group assertions or quantifiers in conjunction with lookahead.

Splitting patterns into multiple lookaheads or lookahead+lookbehind pairs can help solve complex matching requirements more effectively.

Using Lookahead in Various Functions

Let’s discuss how to use lookahead in various Python regex functions, such as re.findall(), re.sub(), re.match(), and re.replace().

Python re.findall() Lookahead 🧐

When you need to find all occurrences of a pattern with lookahead, the re.findall() function is your best friend.

import re

pattern = r'food(?=ie)'
text = "foodie and junkfoodie and seafoodie"
result = re.findall(pattern, text)

print(result)  # Output: ['food', 'food']

In this example, you’re searching for the word 'food' followed by 'ie' without including 'ie' in the result. The re.findall() function makes it easy to extract this information.

πŸ’‘ Recommended: Python Regex Findall Function

Python re.sub() Lookahead πŸ”„

To perform a search-and-replace operation while considering lookahead, you can use the re.sub() function.

pattern = r'food(?=ie)'
text = "foodie and junkfoodie and seafoodie"
result = re.sub(pattern, '***', text)

print(result)  # Output: "***ie and junk***ie and sea***ie"

Here, you’re replacing the word 'food' with asterisks (***) only when it’s followed by 'ie', without including 'ie' in the replacement.

πŸ’‘ Recommended: Python Regex Sub Function

Python re.match() Lookahead πŸ”

re.match() can be used with lookahead to check if the input string starts with the specified pattern.

pattern = r'(?=foodie)'
text = "foodie and junkfoodie and seafoodie"
result = re.match(pattern, text)

print(bool(result))  # Output: True

This example checks whether the string starts with 'foodie' without including it in the result.

πŸ’‘ Recommended: Python Regex Match Function

Python Lookaround

Lookarounds in a regex are powerful zero-width assertions that allow you to peek around your search string to find a match without including surrounding characters. They come in two main flavors: lookahead and lookbehind πŸ’‘.

Lookahead

πŸ’‘ Positive lookahead syntax: (?=<lookahead_regex>)

πŸ’‘ Negative lookahead syntax: (?!<lookahead_regex>)

Lookaheads check if your pattern occurs immediately ahead (to the right) of the parser’s current position. For example, let’s find all alphanumeric characters followed by a decimal digit 🧐.

import re

pattern = r'\w(?=\d)'
text = 'A1B2C3D4'

matches = [match.group() for match in re.finditer(pattern, text)]
print(matches)  # ['A', 'B', 'C']

Using (?=\d) as a positive lookahead, we only matched alphanumeric characters directly preceding a decimal digit πŸŽ‰.

Lookbehind

πŸ’‘ Positive lookbehind syntax: (?<=<lookbehind_regex>)

πŸ’‘ Negative lookbehind syntax: (?<!<lookbehind_regex>)

Positive lookbehinds, on the other hand, check the condition to the left of the current position. For example, let’s find all whitespace characters after a literal dot πŸ•΅οΈβ€β™€οΈ.

pattern = r'(?<=\.)\s'
text = 'Hello. World! How.are.you?'

matches = [match.group() for match in re.finditer(pattern, text)]
print(matches)  # [' ']

Utilizing (?<=\.) as a positive lookbehind, we successfully matched only the whitespace character following a literal dot 🌟.

Frequent Questions

In this section, we’ll discuss some frequent questions you might encounter when working with Python regex lookahead. We will provide solutions and examples for each problem, so you can tackle them more confidently in your own projects. 😊

Python Lookahead vs Lookbehind — What’s the Difference?

In a nutshell, lookahead checks if a certain pattern follows your match, but doesn’t consume characters, while lookbehind checks if a certain pattern comes before your match, and doesn’t consume characters either.

πŸ‘‰ So, if you’re trying to identify a pattern that appears after your current match, use lookahead; if you’re looking for a pattern before your match, use lookbehind.

Python Regex Non-Greedy Lookahead

If you want your lookahead to be non-greedy (match as few characters as possible), you can use the *? quantifier.

πŸ‘‰ For example, if you want to match a string that’s followed by any number of spaces, but not followed by a comma, you could use the regex '.*?(?=\s*)'.

This will match any sequence of characters, but stop matching as soon as it reaches a space or a comma.

Python Regex Lookbehind Variable Length

Python regex lookbehind does not support variable-length patterns, so if you need to use lookbehind with a variable-length pattern, you’ll have to find another approach.

One common workaround is to use a non-capturing group followed by a lookbehind, like '(?:<pattern1>|<pattern2>)(?<=<pattern2>)'. This will allow you to match a variable-length pattern using lookbehind.

Python Regex Lookbehind Fixed Width

If you’re using a fixed-width pattern in your lookbehind, things are much simpler. You can simply use (?<=<pattern>) syntax to achieve this.

πŸ‘‰ For example, to match a three-digit number preceded by “No.”, you could use (?<=No\.)\d{3}. You’ll only match the three-digit number, but the "No." part won’t be included in the match.

Python Regex Lookahead End of String

To check if your pattern is followed by the end of the string, you can use the $ anchor in your lookahead.

πŸ‘‰ For example, if you want to match a sequence of digits only if it’s at the end of a string, you can use '\d+(?=$)'. This will match one or more digits, but only if they’re followed by the end of the string.

Python Regex Conditional Lookahead

Conditional lookahead allows you to specify different patterns based on whether your lookahead succeeds or fails. To use conditional lookahead in Python regex, you can use the (?:(?=A)X|Y) syntax. If the lookahead A succeeds, the pattern X will be tried; if it fails, the pattern Y will be tried.

πŸ‘‰ For example, (?:(?=\d)Even|Odd) will match the word "Even" if it’s followed by a digit, and "Odd" otherwise.

Python Lookahead Iterator and Assertion (+Example)

Using a lookahead iterator means you’re looping over the matches using a lookahead assertion.

For example, if you want to match all instances of the word "apple" that are followed by a space but not include the spaces in the match, you can use the following approach:

import re
text = "I love apple , but not as much as apple pie."
pattern = re.compile(r"apple(?=\s)")
matches = [m.group(0) for m in pattern.finditer(text)]
print(matches)  # Output: ['apple', 'apple']

This example uses a lookahead assertion (?=\s) to ensure "apple" is followed by a space, and an iterator to loop over all matches.

πŸ’‘ Recommended: Python Regex Superpower (Full Guide)