I'm looking for a hash function that:
If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc.
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.
In this case n = 2^64 so the Birthday Paradox formula tells you that as long as the number of keys is significantly less than Sqrt[n] = Sqrt[2^64] = 2^32 or approximately 4 billion, you don't need to worry about collisions. The higher the n , the more accurate this estimation.
Why don't you use a long
variant of the default String.hashCode()
(where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?
// adapted from String.hashCode() public static long hash(String string) { long h = 1125899906842597L; // prime int len = string.length(); for (int i = 0; i < len; i++) { h = 31*h + string.charAt(i); } return h; }
If you're looking for even more bits, you could probably use a Edit: BigInteger
As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:
I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.
To combine the hash for several fields, simply do an XOR multiply one with a prime and add them:
long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);
The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.
An answer for today (2018). SipHash.
It will be much faster than most of the answers here, and significantly higher quality than all of them.
The Guava library has one: https://google.github.io/guava/releases/23.0/api/docs/com/google/common/hash/Hashing.html#sipHash24--
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With