Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why and how does HashMap have its own internal implementation of hashCode() called hash()?

According to this blog entry, HashMap reinvokes its own implementation of hashCode() (called hash()) on a hashcode it already retrieved.

If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like

int hash = hash(hashValue)

and now ,it applies returned hashValue into its own hashing function .

We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is ,It defends against poor quality hash >functions.

Can HashMap accurately reassign hashcodes? HashMap can store objects, but it doesn't have access to the logic that assigns a hashCode its objects. For example, hash() couldn't possibly integrate the logic behind the following hashCode() implementation:

public class Employee {
protected long employeeId;
protected String firstName;
protected String lastName;

public int hashCode(){
    return (int) employeeId;
}

}
like image 519
Muno Avatar asked Oct 16 '15 18:10

Muno


1 Answers

The hash() derives the "improved" hash code from the actual hash code, so equal input will always be equal output (from jdk1.8.0_51):

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

As to why the hash code needs improvement, read the javadoc of the method:

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

like image 157
Andreas Avatar answered Sep 21 '22 22:09

Andreas