Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of items in a HashMap differ when the same program is run in JVM5 vs JVM6

I have an application which displays a collection of objects in rows, one object = one row. The objects are stored in a HashMap. The order of the rows does not affect the functionality of the application (that is why a HashMap was used instead of a sortable collection).

However I have noticed that the same application runs differently when run using two different versions of the Java Virtual Machine. The application is compiled using JDK 5, and can be run using either Java 5 or Java 6 runtimes, without any functional difference.

The object in question overrides java.lang.Object#hashCode() and obviously care has been taken to follow the contract specified in the Java API. This is evidenced by the fact that they always appear in the same order in every run of the application (in the same Java runtime).

For curiosity's sake, why does the choice of Java runtime affect the order?

like image 370
bguiz Avatar asked Dec 10 '09 09:12

bguiz


People also ask

In what order are values stored in HashMap?

The values are printed in the order in which they have been inserted.

Does HashMap guarantee default ordering?

HashMap is unordered; you can't and shouldn't assume anything beyond that. This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

Does HashMap sort automatically?

No, HashMap s don't sort their keys automatically. You want a TreeMap for sorting the keys, or a LinkedHashMap to retain the insertion order.

What happens when 2 objects are the same while inserting it into the HashMap?

Whenever two different objects have the same hash code, we call this a collision. A collision is nothing critical, it just means that there is more than one object in a single bucket, so a HashMap lookup has to look again to find the right object.


2 Answers

The implementation details of HashMap can and do change. Most likely this package private method did (this is from JDK 1.6.0_16):

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

For reference, the analogue in JDK 1.5.0_06 is:

/**
 * Returns a hash value for the specified object.  In addition to 
 * the object's own hashCode, this method applies a "supplemental
 * hash function," which defends against poor quality hash functions.
 * This is critical because HashMap uses power-of two length 
 * hash tables.<p>
 *
 * The shift distances in this function were chosen as the result
 * of an automated search over the entire four-dimensional search space.
 */
static int hash(Object x) {
    int h = x.hashCode();

    h += ~(h << 9);
    h ^=  (h >>> 14);
    h +=  (h << 4);
    h ^=  (h >>> 10);
    return h;
}
like image 138
Michael Borgwardt Avatar answered Sep 28 '22 18:09

Michael Borgwardt


Probably because a Map is not defined to have any particular iteration order; the order in which the elements come back is likely to be an artifact of its internal implementation and does not need to stay consistent.

If the implementation gets updated between Java 5 and 6 (especially for performance reasons), there's no benefit or obligation of Sun to make sure the iteration order stays consistent between the two.

EDIT: I just found an interesting snippet in one of the early Java 6 releases (unfortunately I'm not sure of the exact version but it's apparently HashMap 1.68 from June 2006):

 /**
  * Whether to prefer the old supplemental hash function, for
  * compatibility with broken applications that rely on the
  * internal hashing order.
  *
  * Set to true only by hotspot when invoked via
  * -XX:+UseNewHashFunction or -XX:+AggressiveOpts
  */
 private static final boolean useNewHash;
 static { useNewHash = false; }

 private static int oldHash(int h) {
     h += ~(h << 9);
     h ^= (h >>> 14);
     h += (h << 4);
     h ^= (h >>> 10);
     return h;
 }

 private static int newHash(int h) {
     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
 }

So it seems that despite my above assertions, Sun did in fact consider the consistency of iteration order - at some later point this code was presumably dropped and the new order made the definitive one.

like image 20
Andrzej Doyle Avatar answered Sep 28 '22 19:09

Andrzej Doyle