I have the below code which creates a TreeSet using a comparator based on string length.
public class TreeSetComparator {
public static void main(String[] args) {
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
}
}
To my surprise the output of the above is
[aa]
While I would expect
[aa, bb]
or
[bb, aa]
The "bb" part disappears, which seems to be contrary to the SortedSet contract. The comparator is supposed to only sort the elements and not determine their uniqueness, which is normally determined by equals.
On the other hand, if I enhance the comparator to always return non-zero for unequal items like below, only then do I get the correct results.
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
The output now is [aa, bb]
as I would expect.
Is the above a bug in the TreeSet implementation?
My environment is as follows:
mvn --version 21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"
UPDATE
Here is a related post along with suggestions on how to fix the issue in a future version of Java: https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html
TreeSet shares an important function of setting and returning the comparator that can be used to order the elements in a TreeSet. The method returns a Null value if the set follows the natural ordering pattern of the elements.
TreeSet is the implementation class of Set Interface. It follows a natural sorting order or you can customize it using a comparator and it also does not allow duplicates.
The implementation of a TreeSet is not synchronized. This means that if multiple threads access a tree set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing some object that naturally encapsulates the set.
TreeSet maintains an object in sorted order. SortedSet maintains an object in sorted order.
This it not a bug. At least not in TreeSet
.
From the javadoc, emphasis by me:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
So because "aa" and "bb" both have a length of 2 they are deemed equal by compareTo
and thus by the TreeSet
.
By definition, consistent with equals means:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With