What integer hash function are good that accepts an integer hash key?

2 Answers

I found the following algorithm provides a very good statistical distribution. Each input bit affects each output bit with about 50% probability. There are no collisions (each input results in a different output). The algorithm is fast except if the CPU doesn't have a built-in integer multiplication unit. C code, assuming int is 32 bit (for Java, replace >> with >>> and remove unsigned):

unsigned int hash(unsigned int x) {
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = (x >> 16) ^ x;
    return x;
}

The magic number was calculated using a special multi-threaded test program that ran for many hours, which calculates the avalanche effect (the number of output bits that change if a single input bit is changed; should be nearly 16 on average), independence of output bit changes (output bits should not depend on each other), and the probability of a change in each output bit if any input bit is changed. The calculated values are better than the 32-bit finalizer used by MurmurHash, and nearly as good (not quite) as when using AES. A slight advantage is that the same constant is used twice (it did make it slightly faster the last time I tested, not sure if it's still the case).

You can reverse the process (get the input value from the hash) if you replace the 0x45d9f3b with 0x119de1f3 (the multiplicative inverse):

unsigned int unhash(unsigned int x) {
    x = ((x >> 16) ^ x) * 0x119de1f3;
    x = ((x >> 16) ^ x) * 0x119de1f3;
    x = (x >> 16) ^ x;
    return x;
}

For 64-bit numbers, I suggest to use the following, even thought it might not be the fastest. This one is based on splitmix64, which seems to be based on the blog article Better Bit Mixing (mix 13).

uint64_t hash(uint64_t x) {
    x = (x ^ (x >> 30)) * UINT64_C(0xbf58476d1ce4e5b9);
    x = (x ^ (x >> 27)) * UINT64_C(0x94d049bb133111eb);
    x = x ^ (x >> 31);
    return x;
}

For Java, use long, add L to the constant, replace >> with >>> and remove unsigned. In this case, reversing is more complicated:

uint64_t unhash(uint64_t x) {
    x = (x ^ (x >> 31) ^ (x >> 62)) * UINT64_C(0x319642b2d24d8ec3);
    x = (x ^ (x >> 27) ^ (x >> 54)) * UINT64_C(0x96de1b173f119089);
    x = x ^ (x >> 30) ^ (x >> 60);
    return x;
}

Update: You may also want to look at the Hash Function Prospector project, where other (possibly better) constants are listed.

188

answered Oct 11 '22 06:10

Thomas Mueller

Knuth's multiplicative method:

hash(i)=i*2654435761 mod 2^32

In general, you should pick a multiplier that is in the order of your hash size (2^32 in the example) and has no common factors with it. This way the hash function covers all your hash space uniformly.

Edit: The biggest disadvantage of this hash function is that it preserves divisibility, so if your integers are all divisible by 2 or by 4 (which is not uncommon), their hashes will be too. This is a problem in hash tables - you can end up with only 1/2 or 1/4 of the buckets being used.

answered Oct 11 '22 06:10

Rafał Dowgird

Related questions
                            
                                How are C data types “supported directly by most computers”?
                            
                                Undefined reference to `sin` [duplicate]
                            
                                Printing hexadecimal characters in C
                            
                                Practical usage of setjmp and longjmp in C
                            
                                Why main does not return 0 here?
                            
                                scanf() leaves the newline character in the buffer
                            
                                Why do I get "a label can only be part of a statement and a declaration is not a statement" if I have a variable that is initialized after a label? [duplicate]
                            
                                Freaky way of allocating two-dimensional array?
                            
                                Is "argv[0] = name-of-executable" an accepted standard or just a common convention?
                            
                                Where to put include statements, header or source?
                            
                                How to send a simple string between two programs using pipes?
                            
                                Undefined reference to pow( ) in C, despite including math.h [duplicate]
                            
                                C library function to perform sort
                            
                                printf format specifiers for uint32_t and size_t
                            
                                What predefined macro can I use to detect clang?
                            
                                Using printf with a non-null terminated string
                            
                                How to read a line from the console in C?
                            
                                How to tell where a header file is included from?
                            
                                Compile error: "g++: error trying to exec 'cc1plus': execvp: No such file or directory"
                            
                                How many GCC optimization levels are there?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What integer hash function are good that accepts an integer hash key?

Tags:

c

algorithm

hash

Lear

People also ask

2 Answers

Thomas Mueller

Rafał Dowgird

Recent Activity

Donate For Us