Table of Contents

## Motivation

Type `"helo world"`

into your Google search bar and Google will ask you: `"Did you mean: `

. How is this done?** hello** world"

A simple method to detect these *edit distance*). In fact, Google’s algorithm seems to use some variant of it. (source)

By studying this article, youāll learn about the important practical algorithm to calculate the āLevenshtein distanceā or āedit distanceā.

**Applications**: The Levenshtein distance has important applications. Think about the auto-correction functionality on your smartphone. Say, you type in `"helo"`

in your WhatsApp messenger. Your smartphone then selects several high probability words and sorts them (e.g. by Levenshtein distance). For example, the one with minimal Levenshtein distance (and, hence, maximal similarity) is the string `"hello"`

. Thus, it may automatically correct `"helo"`

to `"hello"`

.

## Defining the Edit Distance

The Levenshtein distance is a metric to calculate the **distance between two strings**. It helps you to quantify how āsimilarā two strings are. The Levenshtein distance is also called āedit distanceā which describes precisely what it measures:

**Definition**: The edit/Levenshtein distance is defined as the number of character edits (*insertions*, *removals*, or *substitutions*) that are needed to transform one string into another.

The intuition is the following: the smaller the Levenshtein distance, the more similar the strings.

## Example Edit Distance

Letās consider an example with two strings `"cat"`

and `"chello"`

. How to calculate the Levenshtein distance in this scenario?

We already know that the Levenshtein distance computes the minimal number of edits (*insert*, *delete*, or *replace*) to reach the second string starting from the first string.

Here is one minimal sequence:

`"cat"`

`"`

(replacecht "`"a"`

by`"h"`

)`"`

(replaceche "`"t"`

by`"e"`

)`"`

(insertchel "`"l"`

at position 3)`"`

(insertchell "`"l"`

at position 4)`"`

(insertchello "`"o"`

at position 5)

In this way, we can transform the string `"cat"`

in the string `"chello"`

in five editing steps ā the Levenshtein distance is 5.

## Calculating the Edit Distance in Python Using a Library

If you’re not interested in creating your own implementation, you can simply install the `editdistance`

library by using pip:

pip install editdistance

Now, you can run it using the `editdistance.eval()`

function with the two strings as arguments:

import editdistance editdistance.eval('banana', 'bahama') # 2L

Okay, let’s have a look at a more beautiful one-liner solution with detailed explanation next.

## Python Recursive Edit Distance

**Problem Statement**: Write a Python one-liner that calculates the Levenshtein distance of two strings `a`

and `b`

.

## The Data a = "cat" b = "chello" c = "chess" ## The One-Liner ls = lambda a, b: len(b) if not a else len(a) if not b \ else min(ls(a[1:],b[1:]) + (a[0]!=b[0]), ls(a[1:],b) + 1, ls(a,b[1:]) + 1) ## The Result print(ls(a,b)) print(ls(a,c)) print(ls(b,c))

**Listing: **Calculating the Levenshtein distance of two strings in one line.

**Exercise**: Whatās the output of this code snippet?

Before I explain the one-liner to you, let’s first rewrite this naive recursive algorithm to a normal multi-line Python function if, unlike me, you don’t like concise Python code:

a = "cat" b = "chello" c = "chess" def ls(a, b): # Recursion base cases if not a: return len(b) if not b: return len(a) # Replace first character if a[0] != b[0]: d_1 = ls(a[1:], b[1:]) + 1 else: d_1 = ls(a[1:], b[1:]) # Remove first character d_2 = ls(a[1:], b) + 1 # Insert first character d_3 = ls(a, b[1:]) + 1 # Edit distance is minimum return min(d_1, d_2, d_3) print(ls(a, b)) # 5 print(ls(a, c)) # 4 print(ls(b, c)) # 3

Before we dive into the code, letās quickly explore an important Python trick we heavily exploit in the one-liner.

In Python, every object has a truth value ā while you are either good or bad in the world of Harry Potter, you are either `True`

or `False`

in the world of Python! Most objects are in fact `True`

. But a few objects are `False`

:

- Zero
`0`

and`0.0`

is`False`

- The empty string
`''`

is`False`

- The empty list
`[]`

is`False`

- The empty dict or set
`{}`

is`False`

š” **Remember**: As a rule of thumb, Python objects are considered `False`

if they are empty or zero.

Equipped with this information, you can now easily understand the first part of the Levenshtein function:

We create a lambda function that returns the number of edits required to transform a string `a`

into`b`

.

There are two trivial cases:

- Suppose string
`a`

is empty. In this case, the minimal edit distance is

insertions of the characters in stringlen (b)`b`

. We cannot do better. - Similarly, if string
`b`

is empty, the minimal edit distance is

.len (a)

Thus, we can directly return the correct edit distance if either of the strings is empty.

Letās say both strings are non-empty (otherwise the solution is trivial as shown previously). Now, we can simplify the problem in three ways.

First, we ignore the leading characters of both strings `a`

and `b`

and calculate the edit distance from slices (i.e., substrings) `a[1:]`

to `b[1:]`

in a recursive manner. If the leading characters `a[0]`

and `b[0]`

are different, we have to fix it by replacing `a[0]`

by `b[0]`

. Hence, we increment the edit distance by one if they are different.

Second, we remove the first character `a[0]`

. Now, we check the minimal edit distance recursively for this smaller problem. As we have removed a character, we increment the result by one.

Third, we (conceptually) insert the character `b[0]`

to the beginning of the word `a`

. Now, we can reduce this problem to the smaller problem that arises if we remove the first character of `b`

. As we have performed one edit operation (inserting), we increment the result by one.

Finally, we simply take the minimum edit distance of all three results (replace the first character, remove the first character, insert the first character).

This one-liner solution demonstrates once again the importance of training your recursion skills ā recursion may not come naturally to you but rest assured that it will after studying many recursive problems like this one.

## Python One-Liners Book: Master the Single Line First!

**Python programmers will improve their computer science skills with these useful one-liners.**

*Python One-Liners* will teach you how to read and write “one-liners”: ** concise statements of useful functionality packed into a single line of code. **You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.

The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.

Detailed explanations of one-liners introduce ** key computer science concepts **and

**. You’ll learn about advanced Python features such as**

*boost your coding and analytical skills**,*

**list comprehension****,**

*slicing***,**

*lambda functions***,**

*regular expressions***and**

*map***functions, and**

*reduce***.**

*slice assignments*You’ll also learn how to:

- Leverage data structures to
**solve real-world problems**, like using Boolean indexing to find cities with above-average pollution - Use
**NumPy basics**such as*array*,*shape*,*axis*,*type*,*broadcasting*,*advanced indexing*,*slicing*,*sorting*,*searching*,*aggregating*, and*statistics* - Calculate basic
**statistics**of multidimensional data arrays and the K-Means algorithms for unsupervised learning - Create more
**advanced regular expressions**using*grouping*and*named groups*,*negative lookaheads*,*escaped characters*,*whitespaces, character sets*(and*negative characters sets*), and*greedy/nongreedy operators* - Understand a wide range of
**computer science topics**, including*anagrams*,*palindromes*,*supersets*,*permutations*,*factorials*,*prime numbers*,*Fibonacci*numbers,*obfuscation*,*searching*, and*algorithmic sorting*

By the end of the book, you’ll know how to ** write Python at its most refined**, and create concise, beautiful pieces of “Python art” in merely a single line.

*Get your Python One-Liners on Amazon!!*

## Additional Implementations Edit Distance Python

There’s a wealth of code already implemented in Python to solve the edit distance problem. Next, I’ll list the most relevant resources for your convenience:

- A collection of Python algorithms to calculate the edit distance with different runtime complexities: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python
- Edit distance in different programming languages: https://www.geeksforgeeks.org/edit-distance-dp-5/
- Complete guide on edit distance: https://python-course.eu/applications-python/levenshtein-distance.php
- Edit distance Python library
`edist`

: https://gitlab.ub.uni-bielefeld.de/bpaassen/python-edit-distances. You can also`pip install edist`

in your Python code.

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.