Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet order and difference with JDK 7 / 8

This is a two part question:

  1. Does HashSet implement some hidden ordering mechanic or it just, to quote the documentation:It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. tells me that the order MIGHT change sometimes in the future and/or depending on memory usage?
  2. Why am I getting completely different 'ordering' (dare I say) when I switch between JDKs?

Take this example:

for (int i = 0; i < 1000; i++) {
        Set<String> stringSet = new HashSet<>();
        stringSet.add("qwe");
        stringSet.add("rtz");
        stringSet.add("123");
        stringSet.add("qwea");
        stringSet.add("12334rasefasd");
        stringSet.add("asdxasd");
        stringSet.add("arfskt6734");
        stringSet.add("123121");
        stringSet.add("");
        stringSet.add("qwr");
        stringSet.add("rtzz");
        stringSet.add("1234");
        stringSet.add("qwes");
        stringSet.add("1234rasefasd");
        stringSet.add("asdxasdq");
        stringSet.add("arfskt6743");
        stringSet.add("123121 ");
        stringSet.add(" ");
        System.out.println(stringSet);
    }

will, no matter how many times I run it, produce the following output:

JDK 7: [, , 123, qwea, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtz, rtzz, 1234, 12334rasefasd, asdxasd, arfskt6734, 123121]

JDK 8: [, , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd]

Obviously, the empty string and the whitespace-only-string are leading the way both times, but the rest of it is completely different.

like image 365
Ognjen Mišić Avatar asked Dec 07 '22 17:12

Ognjen Mišić


1 Answers

According to the update on collections changes page

The alternative String hash function added in 7u6 has been removed from JDK 8, along with the jdk.map.althashing.threshold system property. Instead, hash bins containing a large number of colliding keys improve performance by storing their entries in a balanced tree instead of a linked list. This JDK 8 change applies only to HashMap, LinkedHashMap, and ConcurrentHashMap.

In rare situations, this change could introduce a change to the iteration order of HashMap and HashSet. A particular iteration order is not specified for HashMap objects - any code that depends on iteration order should be fixed.


So, Basically

The Algorithm for hashing your set has changed to improve performance. It changed to a balanced tree instead of a linked list.

This kind of change might change the iteration order of your set, and it's established that this kind of behavior should be fixed by you, if you depend upon it.

You're seeing a better implementation of your set, which might look like it's ordered, but it's pure coincidence.

I would recommend you to not rely on the iteration order for sets, as the order is not a guarantee.

@Edit

Another concept is also important, as stated by user Holger,

The “alternative String hash function” of Java 7 was not used by default. Further, the balanced tree only applies to bucket collision scenarios. Still, as a consequence of this improvement, another, unmentioned change has been made. The mapping of an object’s hashcode to an array position undergoes a transformation which has been simplified from Java 7 to Java 8

like image 148
LeoColman Avatar answered Dec 10 '22 06:12

LeoColman