Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set that only needs equals

Tags:

java

I'm curious, is there any Set that only requires .equals() to determine the uniqueness?

When looking at Set classes from java.util, I can only find HashSet which needs .hashCode() and TreeSet (or generally SortedSet) which requires Comparator. I cannot find any class that use only .equals().

Does it make sense that if I have .equals() method, it is sufficient to use it to determine object uniqueness? Thus have a Set implementation that only need to use .equals()? Or did I miss something here that .equals() are not sufficient to determine object uniqueness in Set implementation?

Note that I am aware of Java practice that if we override .equals(), we should override .hashCode() as well to maintain contract defined in Object.

like image 995
tkokasih Avatar asked Dec 24 '22 20:12

tkokasih


1 Answers

On its own, the equals method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.

The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.

For example, with only the equals method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2 comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.

Because it would not give scalable performance, no such set implementation is in the standard library.


If you don't care about hash table performance, this is a minimal implementation of the hashCode method which works for any class:

@Override
public int hashCode() {
    return 0; // or any other constant
}

Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet or use them as HashMap keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.


Also, for what it's worth, a minimal working Set implementation which only ever uses the equals method would be:

public class ArraySet<E> extends AbstractSet<E> {
    private final ArrayList<E> list = new ArrayList<>();

    @Override
    public boolean add(E e) {
        if (!list.contains(e)) {
            list.add(e);
            return true;
        }
        return false;
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public int size() {
        return list.size();
    }
}

The set stores objects in an ArrayList, and uses list.contains to call equals on objects. Inherited methods from AbstractSet and AbstractCollection provide the bulk of the functionality of the Set interface; for example its remove method gets implemented via the list iterator's remove method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.

Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet.

In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.

like image 90
Boann Avatar answered Jan 06 '23 14:01

Boann