The Rabin-Karb algorithm solves the string-search problem. It was proposed by Richard Karp and Michael Rabin in 1987. Both creators have received the highest award in the area of computer science: the Turing award!
The algorithm is more efficient than the trivial solution of checking all consecutive substrings in an original string.
First, have a look at an implementation of the Rabin-Karb algorithm in Python:
def RabinKarp(string, pattern): n, m = len(string), len(pattern) hpattern = hash(pattern); for i in range(n-m+1): hs = hash(string[i:i+m]) if hs == hpattern: if string[i:i+m] == pattern: return i return -1 s1 = "Ronaldo is better than Messi" print(RabinKarp(s1, "Ronaldo")) # 0 print(RabinKarp(s1, "Football")) # -1 print(RabinKarp(s1, "Messi")) # 23
I used the pseudocode algorithm from the Wikipedia article to create the Python version provided in this article.
There’s a large body of literature concerning the efficient search of strings.
In Python, you can search a string s1 for a substring s2 simply by using the expression
s2 in s1. However, this does not tell you anything about how this actually works.
How does the algorithm work?
The naive string search algorithm simply iterates over all indices of the string s1. It then tries to match all characters of string s2. If it fails, it proceeds with the next index. However, this algorithm has O(len(s1) * len(s2)) worst-case time complexity.
The Rabin-Karb algorithm is more efficient. The improvement comes from using slicing to access the substring starting in index i and comparing the hash values instead of the substrings themselves. This is more efficient because calculating a hash value is much faster than checking equality of two strings. However, two different strings may result in the same hash value. Therefore, the Rabin-Karb algorithm needs to make sure to exclude this case.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.