This question is not about why one multiplies, that is fairly obvious - its about distribution.
Why use a prime number in hashCode?
But rather this is more about one property of multiplication that becomes more important the more factors are included in a hashcode calculation formula.
A simple calculation obviously may overflow but that is of little importance.
a * 31 + b
The real problem is demonstrated when many items are in the formula.
((a * 31) + b) * 31 ... 6n.
Once more than 5 or 6 terms are include the value of the first term is lost as its bits have overflow by the time the hashcode value is up to including the 5+ term. Using this system only the last 5 or so terms are really significant contributors to the final value.
31 ^ 7 > Integer.MAX_VALUE
So why dont most calculations roll the bits that overflow back around and xor w/ the lower bits of the result. I appreciate this requires some bit fiddling and calculations must be done using longs(64 bits) so the top 32 bits can be XOR'd with the integer result but at the very least no bits would be lost.
Is there any particular reason why overflow is ignored ? Its not that costly to use a long as described previously.
EDIT
100000*31^7= 2751261411100000 0x9C641F717C560
6553600000*31^7 180306667837849600000 0xC641F717C5600000
Note that the latter value is exactly 65536 times larger than the previous which also means that its answer is 16 bits larger. Notice that the integer value of 0xC641F717C5600000 is 0xC5600000 the actual significant values are lost from the 16 bit value.
*SAMPLE A*
65536*4096*27512614111
=7385361114638319616
=0x667E12CDF0000000
12345678
=0xF0000000
*SAMPLE B*
9*65536*4096*27512614111
=66468250031744876544
=0x9A6EA93D70000000
12345678
=0x70000000
Notice that the top most bit of SAMPLE B which is exactly 9x SAMPLE A makes almost absolute no difference in the final 32 bit value - if i changed 9x to 17x then the lower bits would be identical. However if the top most bits were not "lost" due to overflow and xord with the lower 32 bits then the value would be different.
That is the benefit from multiplying by an odd number; the earlier numbers never fall off the end of the integer completely. For an element to be lost, 31^n
would need to be a power of 2, and that cannot happen. In your case, with 31^7
, for example, you get 0x67E12CDF
for a 32-bit number; thus, the input element multiplied by that value will still contribute to the result, despite overflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With