I'm curious, is there any Set
that only requires .equals()
to determine the uniqueness?
When looking at Set
classes from java.util
, I can only find HashSet
which needs .hashCode()
and TreeSet
(or generally SortedSet
) which requires Comparator
. I cannot find any class that use only .equals()
.
Does it make sense that if I have .equals()
method, it is sufficient to use it to determine object uniqueness? Thus have a Set
implementation that only need to use .equals()
? Or did I miss something here that .equals()
are not sufficient to determine object uniqueness in Set
implementation?
Note that I am aware of Java practice that if we override .equals()
, we should override .hashCode()
as well to maintain contract defined in Object
.
On its own, the equals
method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.
The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals
method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.
For example, with only the equals
method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requiresn * (n - 1) / 2
comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.
Because it would not give scalable performance, no such set implementation is in the standard library.
If you don't care about hash table performance, this is a minimal implementation of the hashCode
method which works for any class:
@Override
public int hashCode() {
return 0; // or any other constant
}
Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet
or use them as HashMap
keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.
Also, for what it's worth, a minimal working Set
implementation which only ever uses the equals
method would be:
public class ArraySet<E> extends AbstractSet<E> {
private final ArrayList<E> list = new ArrayList<>();
@Override
public boolean add(E e) {
if (!list.contains(e)) {
list.add(e);
return true;
}
return false;
}
@Override
public Iterator<E> iterator() {
return list.iterator();
}
@Override
public int size() {
return list.size();
}
}
The set stores objects in an ArrayList
, and uses list.contains
to call equals
on objects. Inherited methods from AbstractSet
and AbstractCollection
provide the bulk of the functionality of the Set
interface; for example its remove
method gets implemented via the list iterator's remove
method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.
Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet
.
In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With