The default implementation of hashCode()
on HotSpot returns a random value and stores it in the object header. This doesn't seem to have changed in Java 8 where the hash value is calculated by a call to os::random()
:
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
...
I'm wondering why hashCode()
constantly returns the same value, also after shutting down the JVM which I tried by executing the simple test below, restarting my machine and then running main()
again.
public class SimpleTest {
public static void main(String[] args) {
Object obj = new Object();
// This calls toString() which calls hashCode() which calls os::random()
System.out.println(obj);
}
}
How can the output be the same everytime if hashCode()
is actually os::random()
?
java -version
gives
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
Note:
Should someone ask themselves what System.out.println(obj);
, which calls obj.toString()
if the object is non-null and produces something like java.lang.Object@659e0bfd
, has to do with hashCode()
: The part after @
is the object's hash code in hexadecimal (and is unrelated with the object's location in memory, contrary to what the documentation suggests, which has led to misunderstandings).
1) If two objects are equal (i.e. the equals() method returns true), they must have the same hashcode. 2) If the hashCode() method is called multiple times on the same object, it must return the same result every time. 3) Two different objects can have the same hash code.
If multiple objects return the same value from hashCode(), it means that they would be stored in the same bucket. If many objects are stored in the same bucket it means that on average it requires more comparison operations to look up a given object.
If the equals method is implemented as per the contract and the hashcode method returns a constant value, then we will still be able to retrieve the value for the key from a hashMap, but the performance will be slow compared to the method returning a unique hashcode.
HashCode collisionsWhenever two different objects have the same hash code, we call this a collision. A collision is nothing critical, it just means that there is more than one object in a single bucket, so a HashMap lookup has to look again to find the right object.
Deterministic behavior makes code easier to debug because it can be replicated. So implementations tend to choose that where possible. Imagine how hard it would be to replicate some unit test that failed due to a mishandling a hash collision (say, after a hash was reduced in length) if the hashes were different every time.
To answer your question, we first have to ask the secondary question, "Why is os::random()
seeded with a fixed seed?"
As @DavidSchwartz suggested, having a "random" number generator with a fixed seed is very useful, as it gives you arbitrary but deterministic behavior. The JVM developers can call os::random()
and still know the behavior of the JVM isn't dependent on any external factors. Among other benefits, this means JVM tests are repeatable; using a "properly" seeded RNG would make it difficult to reproduce failures related to the RNG.
Now we can answer the original question, rephrased as "Why does HotSpot's implementation of Object.hashCode()
use os::random()
?"
The answer to this question is likely simply because it's easy, and it works. Hash codes need to be well-distributed, something an RNG provides. The simplest, most accessible RNG in this area of the JVM is os::random()
. Since Object.hashCode()
provides no guarantees about the source of these values, it doesn't matter that os::random()
isn't really random at all.
You'll notice that this is only one possible hashing strategy, several others are defined (and chosen by the hashCode
global), including one which they will "likely make ... the default in future releases."
Ultimately, this is just an implementation detail. There is simply no need to more aggressively randomize Object.hashCode()
, and it's entirely possible other JVMs don't do it this way, or other operating systems behave differently. In fact, in Eclipse I see different hash codes when running your code repeatedly. Furthermore, the contract for Object.hashCode()
suggests that typical JVM implementations don't implement Object.hashCode()
this way at all:
This is typically implemented by converting the internal address of the object into an integer
Note also that your test only verifies that the first call to .hashCode()
is consistent. In any sort of multi-threaded program you could not expect this behavior. It also relies on nothing else in the JVM calling os::random()
during execution, which it could do at any time (for instance, if the garbage collector relies on os::random()
the result of .hashCode()
calls after the first GC will be non-deterministic).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With