Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a reason Object.hashCode() is 31-bit?

Tags:

java

hashcode

If you run the following on HotSpot Java 7 64-bit version.

int countTopBit = 0, countLowestBit = 0;
for (int i = 0; i < 100000000; i++) {
    int h = new Object().hashCode();
    if (h < 0)
        countTopBit++;
    if ((h & 1) == 1)
        countLowestBit++;
}
System.out.println("The count of negative hashCodes was " + countTopBit + ", the count of odd hashCodes was " + countLowestBit);

you can get a result like

The count of negative hashCodes was 0, the count of odd hashCodes was 49994232

I was wondering if this means the Object.hashCode() is only really 31-bit and why this might be so?


It is not the case that the top bit is not used. From the source for HashMap

257   /**
258    * Applies a supplemental hash function to a given hashCode, which
259    * defends against poor quality hash functions.  This is critical
260    * because HashMap uses power-of-two length hash tables, that
261    * otherwise encounter collisions for hashCodes that do not differ
262    * in lower bits. Note: Null keys always map to hash 0, thus index 0.
263    */
264   static int hash(int h) {
265       // This function ensures that hashCodes that differ only by
266       // constant multiples at each bit position have a bounded
267       // number of collisions (approximately 8 at default load factor).
268       h ^= (h >>> 20) ^ (h >>> 12);
269       return h ^ (h >>> 7) ^ (h >>> 4);
270   }
like image 429
Peter Lawrey Avatar asked Jan 21 '13 09:01

Peter Lawrey


1 Answers

HotSpot supports a variety of hashing algorithms for Object. As you've discovered emprically, the top bit is always masked out before the result is returned:

// src/share/vm/runtime/synchronizer.cpp
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
   ...
   value &= markOopDesc::hash_mask;
   ...
   return value;
}

markOopDesc::hash_mask is computed as follows:

  enum { age_bits                 = 4,
         lock_bits                = 2,
         biased_lock_bits         = 1,
         max_hash_bits            = BitsPerWord - age_bits - lock_bits - biased_lock_bits,
         hash_bits                = max_hash_bits > 31 ? 31 : max_hash_bits,
         ...
         hash_mask               = right_n_bits(hash_bits),

As you can see, markOopDesc::hash_mask always has bit 31 set to zero.

As to why this is done, your guess is as good as mine. It could have been that the original developer felt that only dealing with positive integers would simplify things down the line. For all we know, it could even be an off-by-one error in the hash_bits computation. ;-)

like image 72
NPE Avatar answered Sep 19 '22 09:09

NPE