I am having trouble understanding how the rolling hash algorithm works after the hash has been reduced to modulus value by dividing by a prime number.
Consider the sequence of 5 digits in the number 123456
.
The first chunk is 12345
. I store the value, in the next window, 6 comes in and 1 goes out.
So the new hash will be (12345-1*10^4)*10 + 6 = 23456
. This is fairly intuitive.
As obvious, these numbers are large, so we need a modulus function to keep them small. Suppose I take 101
as the prime number for this purpose.
So 12345
will reduce to 23
. How then, from this, will I derive the rolling hash for the next window, 23456
?
By using the rolling hash function, we can calculate the hash of each sliding window in linear time. This is the main component of the Rabin-Karp algorithm that provides its best case time complexity of O(n+m).
Initially calculate the hash value of the pattern. Start iterating from the starting of the string: Calculate the hash value of the current substring having length m. If the hash value of the current substring and the pattern are same check if the substring is same as the pattern.
What is the basic principle in Rabin Karp algorithm? Explanation: The basic principle employed in Rabin Karp algorithm is hashing. In the given text every substring is converted to a hash value and compared with the hash value of the pattern.
A rolling hash (also known as recursive hashing or rolling checksum) is a hash function where the input is hashed in a window that moves through the input.
You calculate it the same way you calculate 23456
, but always with modulo 101
.
(((23 - (10^4 mod 101))*10) mod 101 + 6) mod 101 = 24.
which is the value you want because 23456 mod 101 = 24
.
Answer by @dejvuth is right - i would add this specifically when doing rabin-karp is that sometimes you may end up with a -ve modulus value - in that case it is better to take the +ve equivalent of that modulus value - so that checking if the same modulus was seen before is easier.
For example:
with this pattern "abcdabc"
-
and hash function:
hash(i) = (49*S[i]+7*S[i+1]+1*S[i+2])%1123
Results in:
"abc" -> 1046
"bcd" -> 1103
"cda" -> 33
"dab" -> 62
"abc" -> -77
second occurence of "abc"
results is -77
that is modulo equivalent of 1046
since (-77 + 1123 = 1046)
PS: i don't have enough "reputation" at this time to add this as a comment..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With