In my C program, I have four 8-bit (char) variables allocated in a structure. If I want to hash these numbers in order to create keys (representing the whole structures) which will index an array, how shall I do? (In the program there are many of these structures; since I often have to search in a symbol table to see whether they exist, if I don't want to create others, I haven't known which hashing algorithm to use, if I'd want to do a key-indexed search).
I've thought about a kind of hashing which takes the four numbers, turns them in hexadecimal numbers, puts them in succession, and then converts the number that comes out to a decimal number.
But I need something less "heavy"... this method seems too vain, and I think it's not so appropriate for creating array indices.
Is it? Is there another kind of hash functions, which also takes less memory than 32 bits, if it's possible?
You may want to have a look at this list of hash functions.
For implementing a hash table (which is your goal I suppose) you'd want a hash function with avalanche effect to avoid too many hash collisions for similar input values.
Of course, you could use any function to turn your characters into an arbitrary integer representation, but if this representation does not vary for different inputs you effectively have the performance of a linked list (imagine using one of the other suggestions with a table size of 256 and none of the structs varies on byte 4). What is your concern about 32-bit hashes? Of course you would use hash%tablesize for indexing?
Normally you wouldn't use a cryptographic hash function (e.g. md5, sha-1) either. Just pick one of the non-cryptographic hash functions (e.g. Pearson/Jenkins hash).
/* jenkins hash, copied from http://en.wikipedia.org/wiki/Jenkins_hash_function */
uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
{
uint32_t hash, i;
for(hash = i = 0; i < len; ++i)
{
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
Side note: When you have a good hash value distribution, also make sure that the size of the hash table is large enough. You will observe performance to degrade as the occupancy (load factor) of the array approaches 1, because the likelihood of hash collisions will increase.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With