Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do HashSets behave differently in Java 8 and Java 9+?

When trying to remove objects wrapped in iterator Java 8 and Java 9+ behave differently. Consider the following example:

import java.util.Date;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

class Scratch {
  public static void main(String[] args) {
    Set<Date> dates = new HashSet<>();
    dates.add(new Date(100));
    dates.add(new Date(200));

    for (Date date : dates) {
        System.out.println("Initial "+date.getTime()+":"+date.hashCode());
        date.setTime(date.getTime()+42);
        System.out.println("Mutated "+date.getTime()+":"+date.hashCode()+"\n");
    }

    System.out.println("Size before remove iteration: "+dates.size());
    Iterator<Date> iterator = dates.iterator();
    while (iterator.hasNext()) {
        Date date = iterator.next();
        System.out.println("In loop: "+date.getTime()+":"+date.hashCode());
        iterator.remove();
    }
    System.out.println("Size after remove iteration: "+dates.size());
  }
}

After mutating objects inside the HashSet Java 8 refuses to remove them using iterator, check the size after removal loop. Java 8 output:

Initial 100:100
Mutated 142:142

Initial 200:200
Mutated 242:242

Size before remove iteration: 2
In loop: 142:142
In loop: 242:242
Size after remove iteration: 2

Java 9+ output is same as above but:

Size after remove iteration: 0

Why does this happen?

like image 964
Viktor Taranenko Avatar asked Jul 30 '21 15:07

Viktor Taranenko


People also ask

What is the difference between Java 8 and Java 9?

Java 8 applications use packages as a top-level component whereas Java 9 applications use modules as a top-level component. Each Java 9 module has only one module with one module descriptor whereas Java 8 package doesn't build multiple modules into a single module.

How do you find the difference between two hash sets?

Set<Integer> test1 = new HashSet<Integer>(); test1. add(1); test1. add(2); test1. add(3); Set<Integer> test2 = new HashSet<Integer>(); test2.

What is the difference between HashSet and Treeset?

Hash set and tree set both belong to the collection framework. HashSet is the implementation of the Set interface whereas Tree set implements sorted set. Tree set is backed by TreeMap while HashSet is backed by a hashmap.

How do you cast a set to a list?

The most straightforward way to convert a set to a list is by passing the set as an argument while creating the list. This calls the constructor and from there onwards the constructor takes care of the rest. Since the set has been converted to a list, the elements are now ordered.


1 Answers

Something changed about HashSet between Java 8 and Java 9, but the specifics are not actually that interesting, because the way you are using the Set is already specified to be wrong (emphasis mine):

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.

Since Date.equals() depends on the time the Date represents, you do exactly that.

And since you do that the behavior of the set is no longer specified.

Which means it can misbehave in any way/shape/form and still be a conforming implementation.

You could try to find out why Java 9 specifically behaves differently now (I don't know myself), but it doesn't change the fundamental issue that any JVM could at any point in time behave differently yet again if you use the set the wrong way.

Edit: out of curiosity I did investigate what exactly was different and found a relevant change: In both OpenJDK 8 and 9 HashSet is implemented based on a HashMap, so this all focuses on HashMap.

In Java 8 the remove() method of the relevant Iterator implementation contains this line:

K key = p.key;
removeNode(hash(key), key, null, false, false);

This re-hashes (i.e. gets the current hash) of the key (i.e. your Date) and tries to remove that from the Map. Since that new hash was never added in the first place (back when that key was added it had a different hash), this will not find a node and thus not remove anything.

In Java 9 that code looks like this instead:

removeNode(p.hash, p.key, null, false, false);

This will simply pass the previously calculated-and-remembered hash p.hash to the removeNode method which will thus be able to find and remove the node in question.

The changeset that introduced that change mentions this OpenJDK bug.

The comments in there (notably also by Doug Lea) seem to agree that "fixing" the behavior in the face of misused sets is not a goal, but that it could be faster to not re-compute the hash. In other words: that change was done for performance reasons and not for correctness reasons.

So to summarize and re-iterate: both of those behaviors are acceptable implementations, because by changing the equals() behavior of your set entries, you've broken the contract already.

like image 86
Joachim Sauer Avatar answered Oct 27 '22 17:10

Joachim Sauer