Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have a good hash function for a C++ hash table?

I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking what's a good hash function "in general". I've considered CRC32 (but where to find good implementation?) and a few cryptography algorithms. My table, though, has very specific requirements.

Here's what the table will be like:

100,000 items max 200,000 capacity (so the load is 0.5) hashing a 6-character string which is a part of English sentence      examples: "become"    "and he"    ", not " 

The number one priority of my hash table is quick search (retrieval). Quick insertion is not important, but it will come along with quick search. Deletion is not important, and re-hashing is not something I'll be looking into. To handle collisions, I'll be probably using separate chaining as described here. I have already looked at this article, but would like an opinion of those who have handled such task before.

like image 881
DV. Avatar asked Mar 10 '09 03:03

DV.


People also ask

How do you make a good hash function?

A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.

What function can serve as a good hash function?

Characteristics of a Good Hash Function. There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.

Is there a hash function in C?

A Hash Table in C/C++ (Associative array) is a data structure that maps keys to values. This uses a hash function to compute indexes for a key. Based on the Hash Table index, we can store the value at the appropriate location.

Are there hash tables in C?

Hash Table is a data structure which stores data in an associative manner. In hash table, the data is stored in an array format where each data value has its own unique index value. Access of data becomes very fast, if we know the index of the desired data.


2 Answers

Now assumming you want a hash, and want something blazing fast that would work in your case, because your strings are just 6 chars long you could use this magic:

size_t precision = 2; //change the precision with this size_t hash(const char* str) {    return (*(size_t*)str)>> precision; } 

CRC is for slowpokes ;)

Explanation: This works by casting the contents of the string pointer to "look like" a size_t (int32 or int64 based on the optimal match for your hardware). So the contents of the string are interpreted as a raw number, no worries about characters anymore, and you then bit-shift this the precision needed (you tweak this number to the best performance, I've found 2 works well for hashing strings in set of a few thousands).

Also the really neat part is any decent compiler on modern hardware will hash a string like this in 1 assembly instruction, hard to beat that ;)

like image 87
Robert Gould Avatar answered Sep 30 '22 02:09

Robert Gould


This simple polynomial works surprisingly well. I got it from Paul Larson of Microsoft Research who studied a wide variety of hash functions and hash multipliers.

unsigned hash(const char* s, unsigned salt) {     unsigned h = salt;     while (*s)         h = h * 101 + (unsigned) *s++;     return h; } 

salt should be initialized to some randomly chosen value before the hashtable is created to defend against hash table attacks. If this isn't an issue for you, just use 0.

The size of the table is important too, to minimize collisions. Sounds like yours is fine.

like image 20
George V. Reilly Avatar answered Sep 30 '22 02:09

George V. Reilly