Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A set-like collection with a customizable "equals"

I have a data class:

public class MyData {
  final Integer alpha;
  final Double beta;
  final Integer foo;
  final Double bar;
}

I need .equals and .hashCode to have the conventional definition involving all four fields. But I have another important requirement:

Given a large number of MyData objects, I need to rapidly check whether a new MyData object matches any of the existing ones on the .alpha and .beta fields only.

Three approaches I don't want to take:

  1. Composite object:
public class MyData {
  final MyDataKey alphaAndBeta;
  final Integer foo;
  final Double bar;
}
public class MyDataKey {
  final Integer alpha;
  final Double beta;
}

While I could then do lookups against a HashSet<MyDataKey>, it's inelegant because all other uses of the object will need to refer to dataObj.alphaAndBeta.alpha instead of dataObj.alpha.

  1. Comparator:
public class OnlyAlphaAndBeta implements Comparator<MyData> {
  int compare(MyData a, MyData b) {...}
}

This would then let a new TreeSet<MyData>(new OnlyAlphaAndBeta()) do the lookups I want; but with O(log(N)) instead of O(1).

  1. Multi-level lookup:
public class MyDataLookup {
  Map<Integer, Set<Double>> existingAlphaBeta;
  
  boolean contains(MyData query) {
    Set<Double> betas = this.existingAlphaBeta.get(query.alpha);
    if (betas == null) {
      return false;
    }
    return betas.contains(query.beta);
  }

  boolean add(MyData toInsert) {...};
}

This does the job, in O(1), but what if the key was more than just 2 fields? I could keep nesting Map<A, Map<B, Map<C, ...>>> for each field in the key, but this doesn't seem right. Surely I'd rather compute just one hash and look it up in one table.


I think what I'm looking for is something like HashSet, but which can be specialized to use something other than the .equals and .hashCode methods, analogous to how Comparator redefines ordering for SortedSet. Such a collection wouldn't fulfill the Set contract anymore, but it would be "set-like".

Does something like this exist in any of the big, well-maintained Java utility libraries? Alternately, am I overlooking some obvious way of accomplishing my goals?

like image 737
andrewtinka Avatar asked Dec 05 '25 10:12

andrewtinka


1 Answers

Using a Map is the right approach, but you can encapsulate it in a Set implementation having precisely the intended behavior of “a Set with custom equals”.

public class CustomSet<E> extends AbstractSet<E> {
    private final Function<E, Object> theKeyFunction;
    private final HashMap<Object, E> backend = new HashMap<>();

    public CustomSet(Function<E,Object> keyFunction) {
        theKeyFunction = Objects.requireNonNull(keyFunction);
    }

    @Override
    public int size() {
        return backend.size();
    }

    @Override
    public boolean add(E e) {
        Objects.requireNonNull(e);
        return backend.putIfAbsent(theKeyFunction.apply(e), e) == null;
    }

    @Override
    public boolean contains(Object o) {
        if(o == null) return false;
        @SuppressWarnings("unchecked") E e = (E)o;
        Object key;
        try { key = theKeyFunction.apply(e); }
        catch(ClassCastException ex) { return false; }
        return backend.containsKey(key);
    }

    @Override
    public boolean remove(Object o) {
        if(o == null) return false;
        @SuppressWarnings("unchecked") E e = (E)o;
        Object key;
        try { key = theKeyFunction.apply(e); }
        catch(ClassCastException ex) { return false; }
        return backend.remove(key) != null;
    }

    @Override
    public void clear() {
        backend.clear();
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        return backend.values().retainAll(c);
    }

    @Override
    public boolean removeIf(Predicate<? super E> filter) {
        return backend.values().removeIf(filter);
    }

    @Override
    public void forEach(Consumer<? super E> action) {
        backend.values().forEach(action);
    }

    @Override
    public Iterator<E> iterator() {
        return backend.values().iterator();
    }

    @Override
    public Spliterator<E> spliterator() {
        return backend.values().spliterator();
    }

    @Override
    public Object[] toArray() {
        return backend.values().toArray();
    }

    @Override
    public <T> T[] toArray(T[] a) {
        return backend.values().toArray(a);
    }    
}

To keep it simple, this Set does not support null.

This class overrides some methods it doesn’t have to, to provide better performance when iterating or streaming over it. Besides that, it’s rather simple. If you think, “but a Set that internally uses a Map is quite inefficient”, look at the source code of HashSet or TreeSet

This set implementation can be tested like

record Person(String name, int age) {}

Set<Person> nameSet = new CustomSet<>(Person::name);
Set<Person> ageSet = new CustomSet<>(Person::age);

for(String name: List.of("John", "Paul", "George", "Ringo")) {
    for(int age: new int[] { 20, 24, 27, 31 }) {
        Person p = new Person(name, age);
        if(nameSet.add(p)) System.out.println("added " + p + " to nameSet");
        if(ageSet.add(p)) System.out.println("added " + p + " to ageSet");
    }
}
System.out.println();
System.out.println("nameSet: " + nameSet);
System.out.println("ageSet:  " + ageSet);
System.out.println();
Person p = new Person("Paul", 100);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet  contains " + p + "? " + ageSet.contains(p));
p = new Person("Bob", 27);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet  contains " + p + "? " + ageSet.contains(p));
added Person[name=John, age=20] to nameSet
added Person[name=John, age=20] to ageSet
added Person[name=John, age=24] to ageSet
added Person[name=John, age=27] to ageSet
added Person[name=John, age=31] to ageSet
added Person[name=Paul, age=20] to nameSet
added Person[name=George, age=20] to nameSet
added Person[name=Ringo, age=20] to nameSet

nameSet: [Person[name=George, age=20], Person[name=John, age=20], Person[name=Ringo, age=20], Person[name=Paul, age=20]]
ageSet:  [Person[name=John, age=20], Person[name=John, age=24], Person[name=John, age=27], Person[name=John, age=31]]

nameSet contains Person[name=Paul, age=100]?true
ageSet  contains Person[name=Paul, age=100]?false
nameSet contains Person[name=Bob, age=27]?false
ageSet  contains Person[name=Bob, age=27]?true

demonstrating the different understanding of equality of the two sets, which leads to the same warning as applies to TreeSet with comparators not consistent with equals. Mixing sets with different key functions can lead to the same weird behavior as mixing sorted sets with different comparators or mixing such sets with an ordinary hash set.

If the key consists of multiple properties, a dedicated key object is the way to go, but that doesn’t mean that the application domain object has to be a composed object:

record MyData(int alpha, double beta, int foo, double bar) {}

Set<MyData> set = new CustomSet<>(d -> {
    record Key(int alpha, double beta) {}
    return new Key(d.alpha(), d.beta());
});

set.add(new MyData(1, 1.0, 100, 1.23));
System.out.println(set.contains(new MyData(1, 1.0, -1, Double.NaN))); // true

Solutions for older Java versions without record are a bit more verbose, but the principle stays the same. If you don’t need maximum performance, you can also use List keys as they have a working equals and hashCode implementation:

// Java 8 compatible
Set<MyData> set = new CustomSet<>(d -> Arrays.asList(d.alpha(), d.beta()));
like image 139
Holger Avatar answered Dec 06 '25 23:12

Holger