I was looking through some of the .net source yesterday and saw several implementations of GetHashcode with something along the lines of this:
(i1 << 5) + i ^ i2
I understand what the code is doing and why. What I want to know is why they used (i1 << 5) + i instead of (i1 << 5) - i.
Most frameworks I've seen use -i because that's equivalent to multiplying by 31 which is prime, but the Microsoft way is equivalent to multiplying by 33 which has 11 and 3 as factors and thus isn't prime.
Is there a known justification for this? Any reasonable hypotheses?
I asked the same question on math.stackexchange.com: Curious Properties of 33.
The conjecture among mathematicians and the research I did on the topic leads me to believe that the answer is this:
Okay, I found out why Microsoft uses 33. That's called the Bernstein Hash. It turns out that 33 has some magical properties that produce a good distribution of hash codes and there's very little theoretical knowledge as to why.
Basically, in entropy and speed comparisons, Bernstein does well enough and is quite snappy. Dan Bernstein, the guy who came up with the constant 33, wasn't able to explain what property of 33 produced such a good distribution of hashes.
Several papers have been written comparing hash functions and have corroborated this finding without further explaining the benefit of using 33. Further, I couldn't find why Java uses 31 instead. It appears to be a mathematical and programming mystery to date.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With