I'm looking for an efficient hash function for Rabin-Karp algorithm. Here is my actual code (C programming language).
static bool f2(char const *const s1, size_t const n1,
char const *const s2, size_t const n2)
{
uintmax_t hsub = hash(s2, n2);
uintmax_t hs = hash(s1, n1);
size_t nmax = n2 - n1;
for (size_t i = 0; i < nmax; ++i) {
if (hs == hsub) {
if (strncmp(&s1[i], s2, i + n2 - 1) == 0)
return true;
}
hs = hash(&s1[i + 1], i + n2);
}
return false;
}
I considered some Rabin-Karp C implementations, but there are differences between all the codes. So my question is: what are the characteristics that a Rabin-Karp hash function should have?
The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is the numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122.
Explanation: The basic principle employed in Rabin Karp algorithm is hashing. In the given text every substring is converted to a hash value and compared with the hash value of the pattern.
Rabin-Karp improves upon this concept by utilising the fact that comparing the hashes of two strings can be done in linear time and is far more efficient than comparing the individual characters of those strings to find a match. Thus, providing a best case runtime complexity of O(n+m).
A extremly good performing hash is the bernstein hash. It even outruns many popular hashing algorithms.
unsigned bernstein_hash ( void *key, int len )
{
unsigned char *p = key;
unsigned h = 0;
int i;
for ( i = 0; i < len; i++ )
h = 33 * h + p[i];
return h;
}
Of course, you can try out other hashing algorithms, as described here: Hash function on NIST
Note: It has never been explained why the 33
is performing so much better
than any other "more logic" constant.
For your interest: Here is a good comparison of different hash algorithms: strchr comparison of hash algorithms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With