Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java TreeSet with length Comparator bug?

I have the below code which creates a TreeSet using a comparator based on string length.

public class TreeSetComparator {
    public static void main(String[] args) {
        SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
        sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
        System.out.println(sortedSet);
    }
}

To my surprise the output of the above is

[aa]

While I would expect

[aa, bb]

or

[bb, aa]

The "bb" part disappears, which seems to be contrary to the SortedSet contract. The comparator is supposed to only sort the elements and not determine their uniqueness, which is normally determined by equals.

On the other hand, if I enhance the comparator to always return non-zero for unequal items like below, only then do I get the correct results.

    SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
    sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
    System.out.println(sortedSet);

The output now is [aa, bb] as I would expect.

Is the above a bug in the TreeSet implementation?

My environment is as follows:

mvn --version                                                                                                                                            21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"

UPDATE

Here is a related post along with suggestions on how to fix the issue in a future version of Java: https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html

like image 745
Georgios F. Avatar asked Aug 08 '18 20:08

Georgios F.


People also ask

Can we use comparator with TreeSet in Java?

TreeSet shares an important function of setting and returning the comparator that can be used to order the elements in a TreeSet. The method returns a Null value if the set follows the natural ordering pattern of the elements.

Does TreeSet use comparator?

TreeSet is the implementation class of Set Interface. It follows a natural sorting order or you can customize it using a comparator and it also does not allow duplicates.

Does TreeSet is synchronized or not?

The implementation of a TreeSet is not synchronized. This means that if multiple threads access a tree set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing some object that naturally encapsulates the set.

What is difference between SortedSet and TreeSet?

TreeSet maintains an object in sorted order. SortedSet maintains an object in sorted order.


1 Answers

This it not a bug. At least not in TreeSet.

From the javadoc, emphasis by me:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

So because "aa" and "bb" both have a length of 2 they are deemed equal by compareTo and thus by the TreeSet.

By definition, consistent with equals means:

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.

like image 145
Marvin Avatar answered Oct 31 '22 16:10

Marvin