I am looking for a hash-function which operates on a small integer (say in the range 0...1000) and outputs a 64 bit int.
The result-set should look like a random distribution of 64 bit ints: a uniform distribution with no linear correlation between the results.
I was hoping for a function that only takes a few CPU-cycles to execute. (the code will be in C++).
I considered multiplying the input by a big prime number and taking the modulo 2**64 (something like a linear congruent generator), but there are obvious dependencies between the outputs (in the lower bits).
Googling did not show up anything, but I am probably using wrong search terms.
Does such a function exist?
Some Background-info:
I want to avoid using a big persistent table with pseudo random numbers in an algorithm, and calculate random-looking numbers on the fly.
Security is not an issue.
The most commonly used method for hashing integers is called modular hashing: we choose the array size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M. This function is very easy to compute (k % M, in Java), and is effective in dispersing the keys evenly between 0 and M-1.
A technique for randomizing the input to a cryptographic hash function. Source(s): NIST SP 800-106. A process by which the input to a hash function is randomized before being processed by the hash function. Source(s):
The 32-bit long hash value is a hexadecimal number of 8 characters. MD4 is a Message Digest Algorithm developed by Ronald L. Rivest from RSA Data Security, Inc. Currently it's considered insecure, but it's very fast on 32-bit mashines and it's used for calculating EDonkey 2000 hashes in the EDonkey p2p network.
If you use a random function instead of the hash function, you'll always end with all bits set to 1 (like in the case L≫1), and you cannot use it to approximate the number of distinct elements.
I tested the 64-bit finalizer of MurmurHash3 (suggested by @aix and this SO post). This gives zero if the input is zero, so I increased the input parameter by 1 first:
typedef unsigned long long uint64;
inline uint64 fasthash(uint64 i)
{
i += 1ULL;
i ^= i >> 33ULL;
i *= 0xff51afd7ed558ccdULL;
i ^= i >> 33ULL;
i *= 0xc4ceb9fe1a85ec53ULL;
i ^= i >> 33ULL;
return i;
}
Here the input argument i
is a small integer, for example an element of {0, 1, ..., 1000}
. The output looks random:
i fasthash(i) decimal: fasthash(i) hex:
0 12994781566227106604 0xB456BCFC34C2CB2C
1 4233148493373801447 0x3ABF2A20650683E7
2 815575690806614222 0x0B5181C509F8D8CE
3 5156626420896634997 0x47900468A8F01875
... ... ...
There is no linear correlation between subsequent elements of the series:
The range of both axes is 0..2^64-1
Why not use an existing hash function, such as MurmurHash3 with a 64-bit finalizer? According to the author, the function takes tens of CPU cycles per key on current Intel hardware.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With