Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best hash function for Rabin-Karp algorithm?

I'm looking for an efficient hash function for Rabin-Karp algorithm. Here is my actual code (C programming language).

static bool f2(char const *const s1, size_t const n1, 
               char const *const s2, size_t const n2)
{
    uintmax_t hsub = hash(s2, n2);
    uintmax_t hs   = hash(s1, n1);
    size_t   nmax = n2 - n1;

    for (size_t i = 0; i < nmax; ++i) {
        if (hs == hsub) {
            if (strncmp(&s1[i], s2, i + n2 - 1) == 0)
                return true;
        }
        hs = hash(&s1[i + 1], i + n2);
    }
    return false;
}

I considered some Rabin-Karp C implementations, but there are differences between all the codes. So my question is: what are the characteristics that a Rabin-Karp hash function should have?

like image 367
md5 Avatar asked Jul 18 '12 17:07

md5


People also ask

What is the hash function in Rabin-Karp?

The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is the numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122.

What is the best principle in Rabin-Karp algorithm?

Explanation: The basic principle employed in Rabin Karp algorithm is hashing. In the given text every substring is converted to a hash value and compared with the hash value of the pattern.

Why does using a rolling hash make the Rabin-Karp algorithm more efficient?

Rabin-Karp improves upon this concept by utilising the fact that comparing the hashes of two strings can be done in linear time and is far more efficient than comparing the individual characters of those strings to find a match. Thus, providing a best case runtime complexity of O(n+m).


1 Answers

A extremly good performing hash is the bernstein hash. It even outruns many popular hashing algorithms.

unsigned bernstein_hash ( void *key, int len )
{
    unsigned char *p = key;
    unsigned h = 0;
    int i;

    for ( i = 0; i < len; i++ )
        h = 33 * h + p[i];

    return h;
}

Of course, you can try out other hashing algorithms, as described here: Hash function on NIST

Note: It has never been explained why the 33 is performing so much better than any other "more logic" constant.

For your interest: Here is a good comparison of different hash algorithms: strchr comparison of hash algorithms

like image 71
Mare Infinitus Avatar answered Oct 26 '22 13:10

Mare Infinitus