Change to HashMap hash function in Java 8

Tags:

In java 8 java.util.Hashmap I noticed a change from:

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);

to:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

It appears from the code that the new function is a simpler XOR of the lower 16 bits with the upper 16 leaving the upper 16 bits unchanged, as opposed to several different shifts in the previous implementation, and from the comments that this is less effective at allocating the results of hash functions with a high number of collisions in lower bits to different buckets, but saves CPU cycles by having to do less operations.

The only thing I saw in the release notes was the change from linked lists to balanced trees to store colliding keys (which I thought might have changed the amount of time it made sense to spend calculating a good hash), I was specifically interested in seeing if there was any expected performance impact from this change on large hash maps. Is there any information about this change, or does anyone with a better knowledge of hash functions have an idea of what the implications of this change might be (if any, perhaps I just misunderstood the code) and if there was any need to generate hash codes in a different way to maintain performance when moving to Java 8?

313

asked Jul 10 '14 09:07

MilesHampson

1 Answers

As you noted: there is a significant performance improvement in HashMap in Java 8 as described in JEP-180. Basically, if a hash chain goes over a certain size, the HashMap will (where possible) replace it with a balanced binary tree. This makes the "worst case" behaviour of various operations O(log N) instead of O(N).

This doesn't directly explain the change to hash. However, I would hypothesize that the optimization in JEP-180 means that the performance hit due to a poorly distributed hash function is less important, and that the cost-benefit analysis for the hash method changes; i.e. the more complex version is less beneficial on average. (Bear in mind that when the key type's hashcode method generates high quality codes, then gymnastics in the complex version of the hash method are a waste of time.)

But this is only a theory. The real rationale for the hash change is most likely Oracle confidential.

123

answered Oct 19 '22 01:10

Stephen C

Related questions
                            
                                Java - strings equals when decompiled
                            
                                javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection [duplicate]
                            
                                null pointer exception apache poi
                            
                                check whether a string C is an interleaving of A and B
                            
                                Mock services inside another spring service with mockito
                            
                                Why is it impossible to inject generic classes? [closed]
                            
                                algorithm to find the largest area
                            
                                Java - Getting a server's hostname and/or ip address from client
                            
                                Create new log file daily using log4j
                            
                                Grizzly Server with static content and REST resource
                            
                                Java Vs C#: Java and C# subclasses with method overrides output different results in same scenario
                            
                                JSR-356: How to abort a websocket connection during the handshake?
                            
                                JavaFX 2.0 Choice Box Issue. How to update a choiceBox, which represents a list of objects, when an object is updated?
                            
                                Scala async vs. Java ForkJoinTask
                            
                                Differences on Java Sockets between Windows and Linux - How to handle them?
                            
                                Spring Data dynamic query
                            
                                org.hibernate.HibernateException: createQuery is not valid without active transaction @scheduled
                            
                                Why do method breakpoints impact performance so negatively?
                            
                                javax.naming.AuthenticationException: [LDAP: error code 49 - Invalid Credentials]
                            
                                Could not create new instance of class org.jboss.arquillian.test.impl.EventTestRunnerAdaptor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change to HashMap hash function in Java 8

Tags:

java

performance

hashmap

bit-manipulation

hash

MilesHampson

People also ask

1 Answers

Stephen C

Recent Activity

Donate For Us