Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation of the constants used while calculating hashcode value of java.util.hash

Tags:

java

hash

Can someone explain the significance of these constants and why they are chosen?

static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

source: java-se6 library

like image 404
Phoenix Avatar asked Sep 03 '12 20:09

Phoenix


People also ask

How is hashCode value calculated?

hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]... Depending on an application you may see unexpected performance hits when . hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.

How is the hash code using hash function calculated?

With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.

What is hashCode value in Java?

A hash code is an integer value that is associated with each object in Java. Its main purpose is to facilitate hashing in hash tables, which are used by data structures like HashMap.


1 Answers

Understanding what makes for a good hash function is tricky, as there are in fact a great many different functions that are used and for slightly different purposes.

Java's hash tables work as follows:

  1. They ask the key object to produce its hash code. The implementation of the hashCode() method is likely to be of distinctly variable quality (in the worst case, returning a constant value!) and will definitely not be adapted to the particular hash table you're working with.
  2. They then use the above function to mix the bits up a bit, so that information present in the high bits also gets moved down to the low bits. This is important because next …
  3. They take the mod of the hash code (w.r.t. the number of hash table array entries) to get the index into the array of hash table chains. There's a distinct possibility that the hash table array will have size equivalent to a power of 2, so the mixing down of the bits in step 2 is important to ensure that they don't just get thrown away.
  4. They then traverse the chain until they get to the entry with an equal key (according to the equals() method).

To complete the picture, the number of entries in the hash table array is non-constant; if the chains get too long the array gets replaced with a new larger array and everything gets rehashed. That's relatively fast and has good performance implications for normal use patterns (e.g., lots of put()s followed by lots of get()s).

The actual constants used are fairly arbitrary (and are probably chosen by experiment with some simple corpus including things like large numbers of Integer and String values) but their purpose is not: getting the information in the whole value spread to most of the low bits in the value ensures that such information as is present in the output of the hashCode() is used as well as possible.

(You wouldn't do this with perfect hashing or cryptographic hashing; despite the similar names, they have very different implementation strategies. The former requires knowledge of the key space so that collisions are avoided/reduced, and the latter needs information to be moved about in all directions, not just to the low bits.)

like image 177
Donal Fellows Avatar answered Oct 26 '22 14:10

Donal Fellows