The Rabin-Karb Algorithm in Python

The Rabin-Karb algorithm solves the string-search problem. It was proposed by Richard Karp and Michael Rabin in 1987. Both creators have received the highest award in the area of computer science: the Turing award!

The algorithm is more efficient than the trivial solution of checking all consecutive substrings in an original string.

First, have a look at an implementation of the Rabin-Karb algorithm in Python:

def RabinKarp(string, pattern):
    n, m = len(string), len(pattern)
    hpattern = hash(pattern);
    for i in range(n-m+1):
        hs = hash(string[i:i+m])
        if hs == hpattern:
            if string[i:i+m] == pattern:
                return i
    return -1


s1 = "Ronaldo is better than Messi"

print(RabinKarp(s1, "Ronaldo"))
# 0

print(RabinKarp(s1, "Football"))
# -1

print(RabinKarp(s1, "Messi"))
# 23

I used the pseudocode algorithm from the Wikipedia article to create the Python version provided in this article.

There’s a large body of literature concerning the efficient search of strings.

In Python, you can search a string s1 for a substring s2 simply by using the expression s2 in s1. However, this does not tell you anything about how this actually works.

How does the algorithm work?

The naive string search algorithm simply iterates over all indices of the string s1. It then tries to match all characters of string s2. If it fails, it proceeds with the next index. However, this algorithm has O(len(s1) * len(s2)) worst-case time complexity.

The Rabin-Karb algorithm is more efficient. The improvement comes from using slicing to access the substring starting in index i and comparing the hash values instead of the substrings themselves. This is more efficient because calculating a hash value is much faster than checking equality of two strings. However, two different strings may result in the same hash value. Therefore, the Rabin-Karb algorithm needs to make sure to exclude this case.

Leave a Comment