Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashcode calculation why multiply and ignore overflow bits?

This question is not about why one multiplies, that is fairly obvious - its about distribution.

Why use a prime number in hashCode?

But rather this is more about one property of multiplication that becomes more important the more factors are included in a hashcode calculation formula.

A simple calculation obviously may overflow but that is of little importance.

a * 31 + b

The real problem is demonstrated when many items are in the formula.

((a * 31) + b) * 31 ... 6n.

Once more than 5 or 6 terms are include the value of the first term is lost as its bits have overflow by the time the hashcode value is up to including the 5+ term. Using this system only the last 5 or so terms are really significant contributors to the final value.

31 ^ 7 > Integer.MAX_VALUE

So why dont most calculations roll the bits that overflow back around and xor w/ the lower bits of the result. I appreciate this requires some bit fiddling and calculations must be done using longs(64 bits) so the top 32 bits can be XOR'd with the integer result but at the very least no bits would be lost.

Is there any particular reason why overflow is ignored ? Its not that costly to use a long as described previously.

EDIT

100000*31^7=            2751261411100000       0x9C641F717C560
6553600000*31^7 180306667837849600000    0xC641F717C5600000

Note that the latter value is exactly 65536 times larger than the previous which also means that its answer is 16 bits larger. Notice that the integer value of 0xC641F717C5600000 is 0xC5600000 the actual significant values are lost from the 16 bit value.

*SAMPLE A*
65536*4096*27512614111  

=7385361114638319616
=0x667E12CDF0000000
   12345678
=0xF0000000

*SAMPLE B*
9*65536*4096*27512614111

=66468250031744876544
=0x9A6EA93D70000000
   12345678
=0x70000000

Notice that the top most bit of SAMPLE B which is exactly 9x SAMPLE A makes almost absolute no difference in the final 32 bit value - if i changed 9x to 17x then the lower bits would be identical. However if the top most bits were not "lost" due to overflow and xord with the lower 32 bits then the value would be different.

like image 901
mP. Avatar asked Feb 10 '11 00:02

mP.


1 Answers

That is the benefit from multiplying by an odd number; the earlier numbers never fall off the end of the integer completely. For an element to be lost, 31^n would need to be a power of 2, and that cannot happen. In your case, with 31^7, for example, you get 0x67E12CDF for a 32-bit number; thus, the input element multiplied by that value will still contribute to the result, despite overflow.

like image 149
Jeremiah Willcock Avatar answered Sep 30 '22 05:09

Jeremiah Willcock