Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the numbers like 4,20,12,7 used in hash function in `HashMap Class`

I was reading about the fact that How exactly the HashMap works in java .I found code in the hash method in the HashMap class the hashcode is one of the operand with Shift right zero fill operator .The other operands are like 12 7 4 20. Later some more processing is done on the result .My question is why only these four number are chossen for calculating the value in hash function which actually used for calculating the position in the bucket

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
     int i = indexFor(hash, table.length);
     for (Entry<K,V> e = table[i]; e != null; e = e.next) {
         Object k;
         if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
             V oldValue = e.value;
             e.value = value;
             e.recordAccess(this);
             return oldValue;
         }
     }

     modCount++;
     addEntry(hash, key, value, i);
     return null;
}


static int hash(int h) {
     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
}
like image 288
Deepak Avatar asked Oct 20 '22 19:10

Deepak


1 Answers

It’s not that “only these four number are chosen for calculating the value in hash function”, the hash code returned by the hashCode method of the key object is the (very important) input. This method in the HashMap implementation just tries to improve this, given the knowledge about how the HashMap will use that value afterwards.

Typical implementations will only use the lower bits of a hash code as the internal table has a size which is a power of two. Therefore the improvement shall ensure that the likelihood of having different values in the lower bits is the same even if the original hash codes for different keys differ in upper bits only.

Take for example Integer instances used as keys: their hash code is identical to their value as this will spread the hash codes over the entire 2³² int range. But if you put the values 0xa0000000, 0xb0000000, 0xc0000000, 0xd0000000 into the map, a map using only the lower bits would have poor results. This improvement fixes this.

The numbers chosen for this bit manipulation, and the algorithm in general are a field of continuous investigations. You will see changes between JVM implementations as the development never stops.

like image 184
Holger Avatar answered Oct 24 '22 02:10

Holger