Suppose I have a class not implementing the Comparable
interface like
class Dummy {
}
and a collection of instances of this class plus some function external to the class that allows comparing these instances partially (a map will be used for this purpose below):
Collection<Dummy> col = new ArrayList<>();
Map<Dummy, Integer> map = new HashMap<>();
for (int i = 0; i < 12; i++) {
Dummy d = new Dummy();
col.add(d);
map.put(d, i % 4);
}
Now I want to sort this collection using the TreeSet
class with a custom comparator:
TreeSet<Dummy> sorted = new TreeSet<>(new Comparator<Dummy>() {
@Override
public int compare(Dummy o1, Dummy o2) {
return map.get(o1) - map.get(o2);
}
});
sorted.addAll(col);
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals
, i.e. sometimes returns 0
for non-equal elements. My next attempt was to change the compare
method of the comparator to
@Override
public int compare(Dummy o1, Dummy o2) {
int d = map.get(o1) - map.get(o2);
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
return 1; // is this acceptable?
}
It seemingly gives the desired result for this simple demonstrational example but I'm still in doubt: is it correct to always return 1
for unequal (but undistinguishable by the map) objects? Such a relation still violates the general contact for the Comparator.compare()
method because sgn(compare(x, y)) == -sgn(compare(y, x))
is, generally, wrong. Do I really need to implement a correct total ordering for TreeSet
to work correctly or the above is enough? How to do this when an instance has no fields to compare?
For more real-life example imagine that, instead of Dummy
, you have a type parameter T
of some generic class. T
may have some fields and implement the equals()
method through them, but you don't know these fields and yet need to sort instances of this class according to some external function. Is this possible with the help of TreeSet
?
Using System.identityHashCode()
is a great idea but there is (not so small) chance of collision.
Besides possibility of such a collision, there is one more pitfall. Suppose you have 3 objects: a
, b
, c
such that map.get(a) = map.get(b) = map.get(c)
(here =
isn't assignment but the mathematical equality), identityHashCode(a) < identityHashCode(b) < identityHashCode(c)
, a.equals(c)
is true, but a.equals(b)
(and hence c.equals(b)
) is false. After adding these 3 elements to a TreeSet
in this order: a, b, c
you can get into a situation when all of them have been added to the set, that contradicts the prescribed behaviour of the Set
interface - it should not contain equal elements. How to deal with that?
In addition, it would be great if someone familiar with TreeSet
mechanics explained to me what does the term "well-defined" in the phrase "The behavior of a set is well-defined even if its ordering is inconsistent with equals" from TreeSet javadoc mean.
TreeSet shares an important function of setting and returning the comparator that can be used to order the elements in a TreeSet. The method returns a Null value if the set follows the natural ordering pattern of the elements.
Creating TreeSet with Comparator for user-defined Objects The return type for the comparator method is an integer(1, -1, 0). Syntax : TreeSet<Employees> gfg=new TreeSet<>(comparatorObject); gfg. add(new Employee(1,"raja",23);
Comparator interface is used to order the objects of user-defined class. It provides multiple sorting sequence i.e. you can sort the elements based on any data member. For instance it may be on rollno, name, age or anything else.
TreeSet uses tree data structure for storage. Objects are stored in sorted, ascending order. But we can iterate in descending order using method TreeSet. descendingIterator().
Unless you have an absolutely huge amount of Dummy objects and really bad luck, you can use System.identityHashCode()
to break ties:
Comparator.<Dummy>comparingInt(d -> map.get(d))
.thenComparingInt(System::identityHashCode)
Your comparator is not acceptable, since it violates the contract: you have d1 > d2 and d2 > d1 at the same time if they're not equal and don't share the same value in the map.
This answer covers just the first example in the question. The remainder of the question, and the various edits, are I think better answered as part of separate, focused questions.
The first example sets up 12 instances of Dummy
, creates a map that maps each instance to an Integer
in the range [0, 3], and then adds the 12 Dummy
instances to a TreeSet
. That TreeSet
is provided with a comparator that uses the Dummy-to-Integer map. The result is that the TreeSet
contains only four of the Dummy
instances. The example concludes with the following statement:
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements.
This last sentence is incorrect. The result contains fewer elements than were inserted because the comparator considers many of the instances to be duplicates and therefore they aren't inserted into the set. The equals
method doesn't enter the discussion at all. Therefore, the concept of "consistent with equals" isn't relevant to this discussion. TreeSet
never calls equals
. The comparator is the only thing that determines membership in the TreeSet
.
This seems like an unsatisfactory result, but only because we happen "know" that there are 12 distinct Dummy
instances. However, the TreeSet
doesn't "know" that they are distinct. It only knows how to compare the Dummy
instances using the comparator. When it does so, it finds that several are duplicates. That is, the comparator returns 0 sometimes even though it's being called with Dummy
instances that we believe to be distinct. That's why only four Dummy
instances end up in the TreeSet
.
I'm not entirely sure what the desired outcome is, but it seems like the result TreeSet
should contain all 12 instances ordered by values in the Dummy-to-Integer map. My suggestion was to use Guava's Ordering.arbitrary()
which provides a comparator that distinguishes between distinct-but-otherwise-equal elements, but does so in a way that satisfies the general contract of Comparator. If you create the TreeSet
like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(Ordering.arbitrary()));
the result will be that the TreeSet
contains all 12 Dummy
instances, sorted by Integer
value in the map, and with Dummy
instances that map to the same value ordered arbitrarily.
In the comments, you stated that the Ordering.arbitrary
doc "unequivocally cautions against using it in SortedSet
". That's not quite right; that doc says,
Because the ordering is identity-based, it is not "consistent with Object.equals(Object)" as defined by Comparator. Use caution when building a SortedSet or SortedMap from it, as the resulting collection will not behave exactly according to spec.
The phrase "not behave exactly according to spec" really means that it will behave "strangely" as described in the class doc of Comparator
:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if
c.compare(e1, e2)==0
has the same boolean value ase1.equals(e2)
for every e1 and e2 in S.Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of
equals
.For example, suppose one adds two elements a and b such that
(a.equals(b) && c.compare(a, b) != 0)
to an empty TreeSet with comparator c. The second add operation will return true (and the size of the tree set will increase) because a and b are not equivalent from the tree set's perspective, even though this is contrary to the specification of the Set.add method.
You seemed to indicate that this "strange" behavior was unacceptable, in that Dummy
elements that are equals
shouldn't appear in the TreeSet
. But the Dummy
class doesn't override equals
, so it seems like there's an additional requirement lurking behind here.
There are some additional questions added in later edits to the question, but as I mentioned above, I think these are better handled as separate question(s).
UPDATE 2018-12-22
After rereading the question edits and comments, I think I've finally figured out what you're looking for. You want a comparator over any object that provides a primary ordering based on some int-valued function that may result in duplicate values for unequal objects (as determined by the objects' equals
method). Therefore, a secondary ordering is required that provides a total ordering over all unequal objects, but which returns zero for objects that are equals
. This implies that the comparator should be consistent with equals.
Guava's Ordering.arbitrary
comes close in that it provides an arbitrary total ordering over any objects, but it only returns zero for objects that are identical (that is, ==
) but not for objects that are equals
. It's thus inconsistent with equals.
It sounds, then, that you want a comparator that provides an arbitrary ordering over unequal objects. Here's a function that creates one:
static Comparator<Object> arbitraryUnequal() {
Map<Object, Integer> map = new HashMap<>();
return (o1, o2) -> Integer.compare(map.computeIfAbsent(o1, x -> map.size()),
map.computeIfAbsent(o2, x -> map.size()));
}
Essentially, this assigns a sequence number to every newly seen unequal object and keeps these numbers in a map held by the comparator. It uses the map's size as the counter. Since objects are never removed from this map, the size and thus the sequence number always increases.
(If you intend for this comparator to be used concurrently, e.g., in a parallel sort, the HashMap
should be replaced with a ConcurrentHashMap
and the size trick should be modified to use an AtomicInteger
that's incremented when new entries are added.)
Note that the map in this comparator builds up entries for every unequal object that it's ever seen. If this is attached to a TreeSet
, objects will persist in the comparator's map even after they've been removed from the TreeSet
. This is necessary so that if objects are added or removed, they'll retain consistent ordering over time. Guava's Ordering.arbitrary
uses weak references to allow objects to be collected if they're no longer used. We can't do that, because we need to preserve the ordering of non-identical but equal objects.
You'd use it like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(arbitraryUnequal()));
You had also asked what "well-defined" means in the following:
The behavior of a set is well-defined even if its ordering is inconsistent with equals
Suppose you were to use a TreeSet
using a comparator that's inconsistent with equals, such as the one using Guava's Ordering.arbitrary
shown above. The TreeSet
will still work as expected, consistent with itself. That is, it will maintain objects in a total ordering, it will not contain any two objects for which the comparator returns zero, and all its methods will work as specified. However, it is possible for there to be an object for which contains
returns true (since that's computed using the comparator) but for which equals
is false if called with the object actually in the set.
For example, BigDecimal
is Comparable
but its comparison method is inconsistent with equals:
> BigDecimal z = new BigDecimal("0.0")
> BigDecimal zz = new BigDecimal("0.00")
> z.compareTo(zz)
0
> z.equals(zz)
false
> TreeSet<BigDecimal> ts = new TreeSet<>()
> ts.add(z)
> HashSet<BigDecimal> hs = new HashSet<>(ts)
> hs.equals(ts)
true
> ts.contains(zz)
true
> hs.contains(zz)
false
This is what the spec means when it says things can behave "strangely". We have two sets that are equal. Yet they report different results for contains
of the same object, and the TreeSet
reports that it contains an object even though that object is unequal to an object in the set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With