Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparator suitable for TreeSet when there is no distinguishing field

Suppose I have a class not implementing the Comparable interface like

class Dummy {
}

and a collection of instances of this class plus some function external to the class that allows comparing these instances partially (a map will be used for this purpose below):

Collection<Dummy> col = new ArrayList<>();
Map<Dummy, Integer> map = new HashMap<>();
for (int i = 0; i < 12; i++) {
    Dummy d = new Dummy();
    col.add(d);
    map.put(d, i % 4);
}

Now I want to sort this collection using the TreeSet class with a custom comparator:

TreeSet<Dummy> sorted = new TreeSet<>(new Comparator<Dummy>() {
    @Override
    public int compare(Dummy o1, Dummy o2) {
        return map.get(o1) - map.get(o2);
    }
});
sorted.addAll(col);

The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements. My next attempt was to change the compare method of the comparator to

@Override
public int compare(Dummy o1, Dummy o2) {
    int d = map.get(o1) - map.get(o2);
    if (d != 0)
        return d;
    if (o1.equals(o2))
        return 0;
    return 1; // is this acceptable?
}

It seemingly gives the desired result for this simple demonstrational example but I'm still in doubt: is it correct to always return 1 for unequal (but undistinguishable by the map) objects? Such a relation still violates the general contact for the Comparator.compare() method because sgn(compare(x, y)) == -sgn(compare(y, x)) is, generally, wrong. Do I really need to implement a correct total ordering for TreeSet to work correctly or the above is enough? How to do this when an instance has no fields to compare?

For more real-life example imagine that, instead of Dummy, you have a type parameter T of some generic class. T may have some fields and implement the equals() method through them, but you don't know these fields and yet need to sort instances of this class according to some external function. Is this possible with the help of TreeSet?

Edit

Using System.identityHashCode() is a great idea but there is (not so small) chance of collision.

Besides possibility of such a collision, there is one more pitfall. Suppose you have 3 objects: a, b, c such that map.get(a) = map.get(b) = map.get(c) (here = isn't assignment but the mathematical equality), identityHashCode(a) < identityHashCode(b) < identityHashCode(c), a.equals(c) is true, but a.equals(b) (and hence c.equals(b)) is false. After adding these 3 elements to a TreeSet in this order: a, b, c you can get into a situation when all of them have been added to the set, that contradicts the prescribed behaviour of the Set interface - it should not contain equal elements. How to deal with that?

In addition, it would be great if someone familiar with TreeSet mechanics explained to me what does the term "well-defined" in the phrase "The behavior of a set is well-defined even if its ordering is inconsistent with equals" from TreeSet javadoc mean.

like image 814
John McClane Avatar asked Dec 11 '18 22:12

John McClane


People also ask

Can we use comparator with TreeSet?

TreeSet shares an important function of setting and returning the comparator that can be used to order the elements in a TreeSet. The method returns a Null value if the set follows the natural ordering pattern of the elements.

How do you write a comparator for TreeSet?

Creating TreeSet with Comparator for user-defined Objects The return type for the comparator method is an integer(1, -1, 0). Syntax : TreeSet<Employees> gfg=new TreeSet<>(comparatorObject); gfg. add(new Employee(1,"raja",23);

What is significance of using comparator interface while creating a TreeSet class object?

Comparator interface is used to order the objects of user-defined class. It provides multiple sorting sequence i.e. you can sort the elements based on any data member. For instance it may be on rollno, name, age or anything else.

Which of the following methods does TreeSet use to determine order?

TreeSet uses tree data structure for storage. Objects are stored in sorted, ascending order. But we can iterate in descending order using method TreeSet. descendingIterator().


2 Answers

Unless you have an absolutely huge amount of Dummy objects and really bad luck, you can use System.identityHashCode()to break ties:

Comparator.<Dummy>comparingInt(d -> map.get(d))
          .thenComparingInt(System::identityHashCode)

Your comparator is not acceptable, since it violates the contract: you have d1 > d2 and d2 > d1 at the same time if they're not equal and don't share the same value in the map.

like image 86
JB Nizet Avatar answered Oct 29 '22 16:10

JB Nizet


This answer covers just the first example in the question. The remainder of the question, and the various edits, are I think better answered as part of separate, focused questions.

The first example sets up 12 instances of Dummy, creates a map that maps each instance to an Integer in the range [0, 3], and then adds the 12 Dummy instances to a TreeSet. That TreeSet is provided with a comparator that uses the Dummy-to-Integer map. The result is that the TreeSet contains only four of the Dummy instances. The example concludes with the following statement:

The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements.

This last sentence is incorrect. The result contains fewer elements than were inserted because the comparator considers many of the instances to be duplicates and therefore they aren't inserted into the set. The equals method doesn't enter the discussion at all. Therefore, the concept of "consistent with equals" isn't relevant to this discussion. TreeSet never calls equals. The comparator is the only thing that determines membership in the TreeSet.

This seems like an unsatisfactory result, but only because we happen "know" that there are 12 distinct Dummy instances. However, the TreeSet doesn't "know" that they are distinct. It only knows how to compare the Dummy instances using the comparator. When it does so, it finds that several are duplicates. That is, the comparator returns 0 sometimes even though it's being called with Dummy instances that we believe to be distinct. That's why only four Dummy instances end up in the TreeSet.

I'm not entirely sure what the desired outcome is, but it seems like the result TreeSet should contain all 12 instances ordered by values in the Dummy-to-Integer map. My suggestion was to use Guava's Ordering.arbitrary() which provides a comparator that distinguishes between distinct-but-otherwise-equal elements, but does so in a way that satisfies the general contract of Comparator. If you create the TreeSet like this:

SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
                                                  .thenComparing(Ordering.arbitrary()));

the result will be that the TreeSet contains all 12 Dummy instances, sorted by Integer value in the map, and with Dummy instances that map to the same value ordered arbitrarily.

In the comments, you stated that the Ordering.arbitrary doc "unequivocally cautions against using it in SortedSet". That's not quite right; that doc says,

Because the ordering is identity-based, it is not "consistent with Object.equals(Object)" as defined by Comparator. Use caution when building a SortedSet or SortedMap from it, as the resulting collection will not behave exactly according to spec.

The phrase "not behave exactly according to spec" really means that it will behave "strangely" as described in the class doc of Comparator:

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.

Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.

For example, suppose one adds two elements a and b such that (a.equals(b) && c.compare(a, b) != 0) to an empty TreeSet with comparator c. The second add operation will return true (and the size of the tree set will increase) because a and b are not equivalent from the tree set's perspective, even though this is contrary to the specification of the Set.add method.

You seemed to indicate that this "strange" behavior was unacceptable, in that Dummy elements that are equals shouldn't appear in the TreeSet. But the Dummy class doesn't override equals, so it seems like there's an additional requirement lurking behind here.

There are some additional questions added in later edits to the question, but as I mentioned above, I think these are better handled as separate question(s).

UPDATE 2018-12-22

After rereading the question edits and comments, I think I've finally figured out what you're looking for. You want a comparator over any object that provides a primary ordering based on some int-valued function that may result in duplicate values for unequal objects (as determined by the objects' equals method). Therefore, a secondary ordering is required that provides a total ordering over all unequal objects, but which returns zero for objects that are equals. This implies that the comparator should be consistent with equals.

Guava's Ordering.arbitrary comes close in that it provides an arbitrary total ordering over any objects, but it only returns zero for objects that are identical (that is, ==) but not for objects that are equals. It's thus inconsistent with equals.

It sounds, then, that you want a comparator that provides an arbitrary ordering over unequal objects. Here's a function that creates one:

static Comparator<Object> arbitraryUnequal() {
    Map<Object, Integer> map = new HashMap<>();
    return (o1, o2) -> Integer.compare(map.computeIfAbsent(o1, x -> map.size()),
                                       map.computeIfAbsent(o2, x -> map.size()));
}

Essentially, this assigns a sequence number to every newly seen unequal object and keeps these numbers in a map held by the comparator. It uses the map's size as the counter. Since objects are never removed from this map, the size and thus the sequence number always increases.

(If you intend for this comparator to be used concurrently, e.g., in a parallel sort, the HashMap should be replaced with a ConcurrentHashMap and the size trick should be modified to use an AtomicInteger that's incremented when new entries are added.)

Note that the map in this comparator builds up entries for every unequal object that it's ever seen. If this is attached to a TreeSet, objects will persist in the comparator's map even after they've been removed from the TreeSet. This is necessary so that if objects are added or removed, they'll retain consistent ordering over time. Guava's Ordering.arbitrary uses weak references to allow objects to be collected if they're no longer used. We can't do that, because we need to preserve the ordering of non-identical but equal objects.

You'd use it like this:

SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
                                                  .thenComparing(arbitraryUnequal()));

You had also asked what "well-defined" means in the following:

The behavior of a set is well-defined even if its ordering is inconsistent with equals

Suppose you were to use a TreeSet using a comparator that's inconsistent with equals, such as the one using Guava's Ordering.arbitrary shown above. The TreeSet will still work as expected, consistent with itself. That is, it will maintain objects in a total ordering, it will not contain any two objects for which the comparator returns zero, and all its methods will work as specified. However, it is possible for there to be an object for which contains returns true (since that's computed using the comparator) but for which equals is false if called with the object actually in the set.

For example, BigDecimal is Comparable but its comparison method is inconsistent with equals:

> BigDecimal z = new BigDecimal("0.0")
> BigDecimal zz = new BigDecimal("0.00")
> z.compareTo(zz)
0
> z.equals(zz)
false
> TreeSet<BigDecimal> ts = new TreeSet<>()
> ts.add(z)
> HashSet<BigDecimal> hs = new HashSet<>(ts)
> hs.equals(ts)
true
> ts.contains(zz)
true
> hs.contains(zz)
false

This is what the spec means when it says things can behave "strangely". We have two sets that are equal. Yet they report different results for contains of the same object, and the TreeSet reports that it contains an object even though that object is unequal to an object in the set.

like image 42
Stuart Marks Avatar answered Oct 29 '22 16:10

Stuart Marks