Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I hash a string to an int using c++?

I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?

like image 301
zebraman Avatar asked Mar 29 '10 01:03

zebraman


People also ask

Can we do hashing in C?

A Hash Table in C/C++ (Associative array) is a data structure that maps keys to values. This uses a hash function to compute indexes for a key. Based on the Hash Table index, we can store the value at the appropriate location.

Can I hash a string?

It's perfectly fine to have hashes as strings. For example, one of the most popular hash functions, SHA 256 returns a 64 character string that has both letters and numbers. Well, SHA256 returns 256 bits.

How do you hash an int?

The most commonly used method for hashing integers is called modular hashing: we choose the array size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M. This function is very easy to compute (k % M, in Java), and is effective in dispersing the keys evenly between 0 and M-1.

What is djb2?

The djb2 algorithm has a hash function for strings. unsigned long hash = 5381; int c; while (c = *str++) hash = ((hash << 5) + hash) + c; /* hash * 33 + c */


2 Answers

From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):

djb2

this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.

unsigned long hash(unsigned char *str) {
    unsigned long hash = 5381;
    int c;

    while (c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}
like image 154
Tim Cooper Avatar answered Sep 23 '22 15:09

Tim Cooper


Re the first question, sure, e.g, something like:

int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
  hash = hash << 1 | (*it - offset);
}

regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).

like image 29
Alex Martelli Avatar answered Sep 22 '22 15:09

Alex Martelli