OpenJDK's rehashing mechanism

Tags:

Found this code on http://www.docjar.com/html/api/java/util/HashMap.java.html after searching for a HashMap implementation.

  264       static int hash(int h) {
  265           // This function ensures that hashCodes that differ only by
  266           // constant multiples at each bit position have a bounded
  267           // number of collisions (approximately 8 at default load factor).
  268           h ^= (h >>> 20) ^ (h >>> 12);
  269           return h ^ (h >>> 7) ^ (h >>> 4);
  270       }

Can someone shed some light on this? The comment tells us why this code is here but I would like to understand how this improves a bad hash value and how it guarantees that the positions have bounded number of collisions. What do these magic numbers mean?

744

asked Oct 27 '11 20:10

c_maker

1 Answers

In order for it to make any sense it has to be combined with an understanding of how HashMap allocates things in to buckets. This is the trivial function by which a bucket index is chosen:

static int indexFor(int h, int length) {
    return h & (length-1);
}

So you can see, that with a default table size of 16, only the 4 least significant bits of the hash actually matter for allocating buckets! (16 - 1 = 15, which masks the hash by 1111b)

This could clearly be bad news if your hashCode function returned:

10101100110101010101111010111111

01111100010111011001111010111111

11000000010100000001111010111111
//etc etc etc

Such a hash function would not likely be "bad" in any way that is visible to its author. But if you combine it with the way the map allocates buckets, boom, MapFail(tm).

If you keep in mind that h is a 32 bit number, those are not magic numbers at all. It is systematically xoring the most significant bits of the number rightward into the least significant bits. The purpose is so that "differences" in the number that occur anywhere "across" it when viewed in binary become visible down in the least significant bits.

Collisions become bounded because the number of different numbers that have the same relevant LSBs is now significantly bounded because any differences that occur anywhere in the binary representation are compressed into the bits that matter for bucket-ing.

answered Oct 27 '22 12:10

Affe

Related questions
                            
                                Getting template text from FreeMarker in Spring app
                            
                                On-the-fly, in-memory java code compilation for Java 5 and Java 6
                            
                                Java: Getting resolutions of one/all available monitors (instead of the whole desktop)?
                            
                                Eclipse classpath entries only used for tests
                            
                                Hibernate @OneToMany without a separate join table
                            
                                Running a java program as an exe in Windows without JRE installed
                            
                                Print string literal unicode as the actual character
                            
                                Is ArrayList.size() method cached?
                            
                                What is the simplest Java ORM supporting Sqlite?
                            
                                Where does Velocity search for the template?
                            
                                Equivalent of C# Enum Flags Attribute in Java?
                            
                                Using JPA2 criteria API without Metamodel on a List property
                            
                                Creation of Objects: Constructors or Static Factory Methods
                            
                                using serial port RS-232 in android?
                            
                                How can I have a Maven dependency on the runtime classpath but not the test classpath?
                            
                                Guice Performance on Android
                            
                                Why "System.arraycopy" uses "Object" instead of "Object[]"?
                            
                                Gradle: How to configure multiproject setup with side-by-side projects
                            
                                Play Framework appending #_=_ to redirect after Facebook auth via OAuth2?
                            
                                Is String s = "foobar" atomic?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

OpenJDK's rehashing mechanism

Tags:

java

hashmap

bit-manipulation

hash

openjdk

c_maker

People also ask

1 Answers

Affe

Recent Activity

Donate For Us