"helo world" into your Google search bar and Google will ask you:
"Did you mean: hello world". How is this done?
A simple method to detect these
By studying this article, you’ll learn about the important practical algorithm to calculate the “Levenshtein distance” or “edit distance”.
Applications: The Levenshtein distance has important applications. Think about the auto-correction functionality on your smartphone. Say, you type in
"helo" in your WhatsApp messenger. Your smartphone then selects several high probability words and sorts them (e.g. by Levenshtein distance). For example, the one with minimal Levenshtein distance (and, hence, maximal similarity) is the string
"hello". Thus, it may automatically correct
Defining the Edit Distance
The Levenshtein distance is a metric to calculate the distance between two strings. It helps you to quantify how “similar” two strings are. The Levenshtein distance is also called “edit distance” which describes precisely what it measures:
Definition: The edit/Levenshtein distance is defined as the number of character edits (insertions, removals, or substitutions) that are needed to transform one string into another.
The intuition is the following: the smaller the Levenshtein distance, the more similar the strings.
Example Edit Distance
Let’s consider an example with two strings
"chello". How to calculate the Levenshtein distance in this scenario?
We already know that the Levenshtein distance computes the minimal number of edits (insert, delete, or replace) to reach the second string starting from the first string.
Here is one minimal sequence:
"l"at position 3)
"l"at position 4)
"o"at position 5)
In this way, we can transform the string
"cat" in the string
"chello" in five editing steps – the Levenshtein distance is 5.
Calculating the Edit Distance in Python Using a Library
pip install editdistance
Now, you can run it using the
editdistance.eval() function with the two strings as arguments:
import editdistance editdistance.eval('banana', 'bahama') # 2L
Okay, let’s have a look at a more beautiful one-liner solution with detailed explanation next.
Python Recursive Edit Distance
Problem Statement: Write a Python one-liner that calculates the Levenshtein distance of two strings
## The Data a = "cat" b = "chello" c = "chess" ## The One-Liner ls = lambda a, b: len(b) if not a else len(a) if not b \ else min(ls(a[1:],b[1:]) + (a!=b), ls(a[1:],b) + 1, ls(a,b[1:]) + 1) ## The Result print(ls(a,b)) print(ls(a,c)) print(ls(b,c))
Listing: Calculating the Levenshtein distance of two strings in one line.
Exercise: What’s the output of this code snippet?
Before I explain the one-liner to you, let’s first rewrite this naive recursive algorithm to a normal multi-line Python function if, unlike me, you don’t like concise Python code:
a = "cat" b = "chello" c = "chess" def ls(a, b): # Recursion base cases if not a: return len(b) if not b: return len(a) # Replace first character if a != b: d_1 = ls(a[1:], b[1:]) + 1 else: d_1 = ls(a[1:], b[1:]) # Remove first character d_2 = ls(a[1:], b) + 1 # Insert first character d_3 = ls(a, b[1:]) + 1 # Edit distance is minimum return min(d_1, d_2, d_3) print(ls(a, b)) # 5 print(ls(a, c)) # 4 print(ls(b, c)) # 3
Before we dive into the code, let’s quickly explore an important Python trick we heavily exploit in the one-liner.
In Python, every object has a truth value – while you are either good or bad in the world of Harry Potter, you are either
False in the world of Python! Most objects are in fact
True. But a few objects are
- The empty string
- The empty list
- The empty dict or set
💡 Remember: As a rule of thumb, Python objects are considered
False if they are empty or zero.
Equipped with this information, you can now easily understand the first part of the Levenshtein function:
We create a lambda function that returns the number of edits required to transform a string
There are two trivial cases:
- Suppose string
ais empty. In this case, the minimal edit distance is
insertions of the characters in string
b. We cannot do better.
- Similarly, if string
bis empty, the minimal edit distance is
Thus, we can directly return the correct edit distance if either of the strings is empty.
Let’s say both strings are non-empty (otherwise the solution is trivial as shown previously). Now, we can simplify the problem in three ways.
First, we ignore the leading characters of both strings
b and calculate the edit distance from slices (i.e., substrings)
b[1:] in a recursive manner. If the leading characters
b are different, we have to fix it by replacing
b. Hence, we increment the edit distance by one if they are different.
Second, we remove the first character
a. Now, we check the minimal edit distance recursively for this smaller problem. As we have removed a character, we increment the result by one.
Third, we (conceptually) insert the character
b to the beginning of the word
a. Now, we can reduce this problem to the smaller problem that arises if we remove the first character of
b. As we have performed one edit operation (inserting), we increment the result by one.
Finally, we simply take the minimum edit distance of all three results (replace the first character, remove the first character, insert the first character).
This one-liner solution demonstrates once again the importance of training your recursion skills – recursion may not come naturally to you but rest assured that it will after studying many recursive problems like this one.
Python One-Liners Book: Master the Single Line First!
Python programmers will improve their computer science skills with these useful one-liners.
Python One-Liners will teach you how to read and write “one-liners”: concise statements of useful functionality packed into a single line of code. You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.
The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.
Detailed explanations of one-liners introduce key computer science concepts and boost your coding and analytical skills. You’ll learn about advanced Python features such as list comprehension, slicing, lambda functions, regular expressions, map and reduce functions, and slice assignments.
You’ll also learn how to:
- Leverage data structures to solve real-world problems, like using Boolean indexing to find cities with above-average pollution
- Use NumPy basics such as array, shape, axis, type, broadcasting, advanced indexing, slicing, sorting, searching, aggregating, and statistics
- Calculate basic statistics of multidimensional data arrays and the K-Means algorithms for unsupervised learning
- Create more advanced regular expressions using grouping and named groups, negative lookaheads, escaped characters, whitespaces, character sets (and negative characters sets), and greedy/nongreedy operators
- Understand a wide range of computer science topics, including anagrams, palindromes, supersets, permutations, factorials, prime numbers, Fibonacci numbers, obfuscation, searching, and algorithmic sorting
By the end of the book, you’ll know how to write Python at its most refined, and create concise, beautiful pieces of “Python art” in merely a single line.
Additional Implementations Edit Distance Python
There’s a wealth of code already implemented in Python to solve the edit distance problem. Next, I’ll list the most relevant resources for your convenience:
- A collection of Python algorithms to calculate the edit distance with different runtime complexities: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python
- Edit distance in different programming languages: https://www.geeksforgeeks.org/edit-distance-dp-5/
- Complete guide on edit distance: https://python-course.eu/applications-python/levenshtein-distance.php
- Edit distance Python library
edist: https://gitlab.ub.uni-bielefeld.de/bpaassen/python-edit-distances. You can also
pip install edistin your Python code.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.