Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding how rolling hash works with modulus in Rabin Karp algorithm

I am having trouble understanding how the rolling hash algorithm works after the hash has been reduced to modulus value by dividing by a prime number.

Consider the sequence of 5 digits in the number 123456.

The first chunk is 12345. I store the value, in the next window, 6 comes in and 1 goes out.

So the new hash will be (12345-1*10^4)*10 + 6 = 23456. This is fairly intuitive.

As obvious, these numbers are large, so we need a modulus function to keep them small. Suppose I take 101 as the prime number for this purpose.

So 12345 will reduce to 23. How then, from this, will I derive the rolling hash for the next window, 23456?

like image 732
SexyBeast Avatar asked Apr 23 '16 17:04

SexyBeast


People also ask

Why does using a rolling hash make the Rabin-Karp algorithm more efficient?

By using the rolling hash function, we can calculate the hash of each sliding window in linear time. This is the main component of the Rabin-Karp algorithm that provides its best case time complexity of O(n+m).

How is hash value calculated in Rabin-Karp algorithm?

Initially calculate the hash value of the pattern. Start iterating from the starting of the string: Calculate the hash value of the current substring having length m. If the hash value of the current substring and the pattern are same check if the substring is same as the pattern.

What is the basic principle in Rabin-Karp algorithm hashing?

What is the basic principle in Rabin Karp algorithm? Explanation: The basic principle employed in Rabin Karp algorithm is hashing. In the given text every substring is converted to a hash value and compared with the hash value of the pattern.

What is rolling hash function?

A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.


2 Answers

You calculate it the same way you calculate 23456, but always with modulo 101.

(((23 - (10^4 mod 101))*10) mod 101 + 6) mod 101 = 24.

which is the value you want because 23456 mod 101 = 24.

like image 139
dejvuth Avatar answered Oct 04 '22 02:10

dejvuth


Answer by @dejvuth is right - i would add this specifically when doing rabin-karp is that sometimes you may end up with a -ve modulus value - in that case it is better to take the +ve equivalent of that modulus value - so that checking if the same modulus was seen before is easier.

For example: with this pattern "abcdabc" - and hash function: hash(i) = (49*S[i]+7*S[i+1]+1*S[i+2])%1123

Results in:

"abc" -> 1046
"bcd" -> 1103
"cda" -> 33
"dab" -> 62
"abc" -> -77

second occurence of "abc" results is -77 that is modulo equivalent of 1046 since (-77 + 1123 = 1046)

PS: i don't have enough "reputation" at this time to add this as a comment..

like image 23
Ravi Avatar answered Oct 04 '22 02:10

Ravi