What is the algorithm used in the JVM
to implement java.lang.Object
's implicit hashCode()
method?
[OpenJDK
or Oracle JDK
are preferred in the answers].
An efficient hashing algorithm generates distinct hashcodes for unequal objects. The hashCode method should be consistent in its implementation of the equals() function. hashCode() and hashCode(int value) methods, defined in the Integer class, compute the hash codes of integer objects or values in Java.
Simply put, hashCode() returns an integer value, generated by a hashing algorithm. Objects that are equal (according to their equals()) must return the same hash code. Different objects do not need to return different hash codes.
A hash code is an integer value that is associated with each object in Java. Its main purpose is to facilitate hashing in hash tables, which are used by data structures like HashMap.
To get this hashcode value for an object, we can use the hashcode() method in Java. It is the means hashcode() method that returns the integer hashcode value of the given object. Since this method is defined in the Object class, hence it is inherited by user-defined classes also.
It's implementation dependant (and heavily so, the algorithm is entirely up to the implementation, as long as it's consistent.) However, as per the answer here, you can see the native source file where the hash is generated in OpenJDK 7 (look at the get_next_hash()
function), which actually specifies a number of possible algorithms in this particular release:
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
// 2654435761 = 2^32 * Phi (golden ratio)
// HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
// in undesirable regularity in the hashCode values of adjacent objects
// (objects allocated back-to-back, in particular). This could potentially
// result in hashtable collisions and reduced hashtable efficiency.
// There are simple ways to "diffuse" the middle address bits over the
// generated hashCode values
//
As already stated, the default algorithm is to simply go with a random number, however an interesting comment further down states:
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
This is similar to the (obj ^ stwRandom)
approach mentioned in the above list of possibilities, but it gets around the unwanted regularities associated with objects allocated quickly in succession by tying "thread-specific state" information into the hash also - so if two objects happened to be allocated at a very similar time due to them being allocated on two separate threads executing simultaneously, the discrete threading information fed into the hash should still ensure discrete enough hashes will be generated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With