Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing radix and modulus prime in rabin-karp rolling hash

The hash function is explained on Wikipedia

It says, "The choice of a and n is critical to get good hashing;" and refers to a Linear congruential generator article that doesn't feel relevant. I cant figure out how the values are chosen. Any suggestions?

like image 580
Aks Avatar asked Nov 11 '22 13:11

Aks


1 Answers

The basis of this algorithm is that a nonzero polynomial of degree at most d has at most d zeros. Each length-k string has its own associated polynomial of degree k - 1, and we screen for possible matches by subtracting the polynomials of the strings in question and evaluating at a. If the strings are equal, then the result is always zero. If the strings are not equal, then the result is zero if and only if a is one of the zeros of the polynomial difference (this is the fact that puts the primality requirement on n, as the integers mod n otherwise would not be a field).

In theory, at least, we want a to be random so that an oblivious adversary cannot create false positives with any frequency. If we don't expect trouble, then it might be better to choose a so that multiplication by a is cheap (e.g., the binary expansion of a has a small number of one bits). Nevertheless, some choices are bad on typical string sets (e.g., a = 1). We want n to be large enough to avoid false positives (probability (k - 1)/n) by random chance but small enough and preferably of a special form so that the modulo computations are efficient.

like image 70
David Eisenstat Avatar answered Nov 15 '22 06:11

David Eisenstat