Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change to HashMap hash function in Java 8

In java 8 java.util.Hashmap I noticed a change from:

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);

to:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

It appears from the code that the new function is a simpler XOR of the lower 16 bits with the upper 16 leaving the upper 16 bits unchanged, as opposed to several different shifts in the previous implementation, and from the comments that this is less effective at allocating the results of hash functions with a high number of collisions in lower bits to different buckets, but saves CPU cycles by having to do less operations.

The only thing I saw in the release notes was the change from linked lists to balanced trees to store colliding keys (which I thought might have changed the amount of time it made sense to spend calculating a good hash), I was specifically interested in seeing if there was any expected performance impact from this change on large hash maps. Is there any information about this change, or does anyone with a better knowledge of hash functions have an idea of what the implications of this change might be (if any, perhaps I just misunderstood the code) and if there was any need to generate hash codes in a different way to maintain performance when moving to Java 8?

like image 313
MilesHampson Avatar asked Jul 10 '14 09:07

MilesHampson


People also ask

Is there any change in HashMap in Java 8?

In Java 8, HashMap replaces the linked list with another useful data structure i.e. binary tree on breaching a certain threshold, which is known as TREEIFY_THRESHOLD . Once this threshold is reached the linked list of Entries is converted to the TreeNodes which reduces the time complexity from O(n) to O(log(n)) .

What has been the change in HashMap implementation from Java 7 to Java 8?

The hash will change from using a linked list to a balanced tree. Above changes ensure the performance of O(log(n)) in worst case scenarios and O(1) with proper hashCode(). The alternative String hash function added in Java 7 has been removed.

What is the improvement done for HashMap in java8?

Starting from Java 8, one optimization is built-in in HashMap: When buckets are getting too large, they're transformed into trees, instead of linked lists. That brings the pessimistic time of O(n) to O(log(n)), which is much better. For that to work, the keys of HashMap need to implement the Comparable interface.

How does HashMap work in java8?

HashMap works on the principle of hashing data structure or technique that uses an object's hashcode to place that object inside the map. Hashing involves Bucket, Hash function (hashCode() method), and Hash value. It provides the best time complexity of O(1) for insertion and retrieval of objects.


1 Answers

As you noted: there is a significant performance improvement in HashMap in Java 8 as described in JEP-180. Basically, if a hash chain goes over a certain size, the HashMap will (where possible) replace it with a balanced binary tree. This makes the "worst case" behaviour of various operations O(log N) instead of O(N).

This doesn't directly explain the change to hash. However, I would hypothesize that the optimization in JEP-180 means that the performance hit due to a poorly distributed hash function is less important, and that the cost-benefit analysis for the hash method changes; i.e. the more complex version is less beneficial on average. (Bear in mind that when the key type's hashcode method generates high quality codes, then gymnastics in the complex version of the hash method are a waste of time.)

But this is only a theory. The real rationale for the hash change is most likely Oracle confidential.

like image 123
Stephen C Avatar answered Oct 19 '22 01:10

Stephen C