I've been working on a Rabin-Karp string matching function in C++ and I'm not getting any results out of it. I have a feeling that I'm not computing some of the values correctly, but I don't know which one(s).
Prototype
void rabinKarp(string sequence, string pattern, int d, int q);
Function Implementation
void rabinKarp(string sequence, string pattern, int d, int q)
{
//d is the |∑|
//q is the prime number to use to lessen spurious hits
int n = sequence.length(); //Length of the sequence
int m = pattern.length(); //Length of the pattern
double temp = static_cast<double> (m - 1.0);
double temp2 = pow(static_cast<double> (d), temp); //Exponentiate d
int h = (static_cast<int>(temp2)) % q; //High Order Position of an m-digit window
int p = 0; //Pattern decimal value
int t = 0; //Substring decimal value
for (int i = 1; i < m; i++) { //Preprocessing
p = (d*p + (static_cast<int>(pattern[i]) - 48)) % q;
t = (d*t + (static_cast<int>(sequence[i])-48)) % q;
}
for (int s = 0; s < (n-m); s++) { //Matching(Iterate through all possible shifts)
if (p == t) {
for (int j = 0; j < m; j++) {
if (pattern[j] == sequence[s+j]) {
cout << "Pattern occurs with shift: " << s << endl;
}
}
}
if (s < (n-m)) {
t = (d*(t - ((static_cast<int>(sequence[s+1]) - 48)*h)) + (static_cast<int>(sequence[s + m + 1]) - 48)) % q;
}
}
return;
}
In my function call I pass 2359023141526739921 as the sequence, 31415 as the pattern, 10 as the radix, and 13 as the prime. I expect there to be one actual match and one spurious hit, but I never get the output statement from the matching part of the function. What am I doing wrong?
Thanks in Advance, Madison
The big gotcha in coding the Rabin Karp is the modulo operator. When two numbers X and Y are congruent modulo Q then (X % Q) should equal (Y % Q) but on the C++ compiler you're using they will only be equal if X and Y are both positive or both negative. If X is positive and Y is negative then (X % Q) will be positive and (Y % Q) will negative. In fact (X % Q)-Q == (Y % Q) in this case.
The work around is to check for negative values after each modulo and if there are any to add q to the variable, so your preprocessing loop becomes :
p = (d*p + pattern[i]) % q;
if ( p < 0 ) p += q;
t = (d*t + sequence[i]) % q;
if ( t < 0 ) t += q;
t in the main loop needs to have a similar check added.
Unless you've redefined ^
, it is computing xor, not exponentiation. Also, you should be careful about overflowing the maximum value of an int
before you perform %
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With