I have a data class:
public class MyData {
final Integer alpha;
final Double beta;
final Integer foo;
final Double bar;
}
I need .equals and .hashCode to have the conventional definition involving all four fields. But I have another important requirement:
Given a large number of MyData objects, I need to rapidly check whether a new MyData object matches any of the existing ones on the .alpha and .beta fields only.
Three approaches I don't want to take:
public class MyData {
final MyDataKey alphaAndBeta;
final Integer foo;
final Double bar;
}
public class MyDataKey {
final Integer alpha;
final Double beta;
}
While I could then do lookups against a HashSet<MyDataKey>, it's inelegant because all other uses of the object will need to refer to dataObj.alphaAndBeta.alpha instead of dataObj.alpha.
public class OnlyAlphaAndBeta implements Comparator<MyData> {
int compare(MyData a, MyData b) {...}
}
This would then let a new TreeSet<MyData>(new OnlyAlphaAndBeta()) do the lookups I want; but with O(log(N)) instead of O(1).
public class MyDataLookup {
Map<Integer, Set<Double>> existingAlphaBeta;
boolean contains(MyData query) {
Set<Double> betas = this.existingAlphaBeta.get(query.alpha);
if (betas == null) {
return false;
}
return betas.contains(query.beta);
}
boolean add(MyData toInsert) {...};
}
This does the job, in O(1), but what if the key was more than just 2 fields? I could keep nesting Map<A, Map<B, Map<C, ...>>> for each field in the key, but this doesn't seem right. Surely I'd rather compute just one hash and look it up in one table.
I think what I'm looking for is something like HashSet, but which can be specialized to use something other than the .equals and .hashCode methods, analogous to how Comparator redefines ordering for SortedSet. Such a collection wouldn't fulfill the Set contract anymore, but it would be "set-like".
Does something like this exist in any of the big, well-maintained Java utility libraries? Alternately, am I overlooking some obvious way of accomplishing my goals?
Using a Map is the right approach, but you can encapsulate it in a Set implementation having precisely the intended behavior of “a Set with custom equals”.
public class CustomSet<E> extends AbstractSet<E> {
private final Function<E, Object> theKeyFunction;
private final HashMap<Object, E> backend = new HashMap<>();
public CustomSet(Function<E,Object> keyFunction) {
theKeyFunction = Objects.requireNonNull(keyFunction);
}
@Override
public int size() {
return backend.size();
}
@Override
public boolean add(E e) {
Objects.requireNonNull(e);
return backend.putIfAbsent(theKeyFunction.apply(e), e) == null;
}
@Override
public boolean contains(Object o) {
if(o == null) return false;
@SuppressWarnings("unchecked") E e = (E)o;
Object key;
try { key = theKeyFunction.apply(e); }
catch(ClassCastException ex) { return false; }
return backend.containsKey(key);
}
@Override
public boolean remove(Object o) {
if(o == null) return false;
@SuppressWarnings("unchecked") E e = (E)o;
Object key;
try { key = theKeyFunction.apply(e); }
catch(ClassCastException ex) { return false; }
return backend.remove(key) != null;
}
@Override
public void clear() {
backend.clear();
}
@Override
public boolean retainAll(Collection<?> c) {
return backend.values().retainAll(c);
}
@Override
public boolean removeIf(Predicate<? super E> filter) {
return backend.values().removeIf(filter);
}
@Override
public void forEach(Consumer<? super E> action) {
backend.values().forEach(action);
}
@Override
public Iterator<E> iterator() {
return backend.values().iterator();
}
@Override
public Spliterator<E> spliterator() {
return backend.values().spliterator();
}
@Override
public Object[] toArray() {
return backend.values().toArray();
}
@Override
public <T> T[] toArray(T[] a) {
return backend.values().toArray(a);
}
}
To keep it simple, this Set does not support null.
This class overrides some methods it doesn’t have to, to provide better performance when iterating or streaming over it. Besides that, it’s rather simple. If you think, “but a Set that internally uses a Map is quite inefficient”, look at the source code of HashSet or TreeSet…
This set implementation can be tested like
record Person(String name, int age) {}
Set<Person> nameSet = new CustomSet<>(Person::name);
Set<Person> ageSet = new CustomSet<>(Person::age);
for(String name: List.of("John", "Paul", "George", "Ringo")) {
for(int age: new int[] { 20, 24, 27, 31 }) {
Person p = new Person(name, age);
if(nameSet.add(p)) System.out.println("added " + p + " to nameSet");
if(ageSet.add(p)) System.out.println("added " + p + " to ageSet");
}
}
System.out.println();
System.out.println("nameSet: " + nameSet);
System.out.println("ageSet: " + ageSet);
System.out.println();
Person p = new Person("Paul", 100);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet contains " + p + "? " + ageSet.contains(p));
p = new Person("Bob", 27);
System.out.println("nameSet contains " + p + "? " + nameSet.contains(p));
System.out.println("ageSet contains " + p + "? " + ageSet.contains(p));
added Person[name=John, age=20] to nameSet
added Person[name=John, age=20] to ageSet
added Person[name=John, age=24] to ageSet
added Person[name=John, age=27] to ageSet
added Person[name=John, age=31] to ageSet
added Person[name=Paul, age=20] to nameSet
added Person[name=George, age=20] to nameSet
added Person[name=Ringo, age=20] to nameSet
nameSet: [Person[name=George, age=20], Person[name=John, age=20], Person[name=Ringo, age=20], Person[name=Paul, age=20]]
ageSet: [Person[name=John, age=20], Person[name=John, age=24], Person[name=John, age=27], Person[name=John, age=31]]
nameSet contains Person[name=Paul, age=100]?true
ageSet contains Person[name=Paul, age=100]?false
nameSet contains Person[name=Bob, age=27]?false
ageSet contains Person[name=Bob, age=27]?true
demonstrating the different understanding of equality of the two sets, which leads to the same warning as applies to TreeSet with comparators not consistent with equals. Mixing sets with different key functions can lead to the same weird behavior as mixing sorted sets with different comparators or mixing such sets with an ordinary hash set.
If the key consists of multiple properties, a dedicated key object is the way to go, but that doesn’t mean that the application domain object has to be a composed object:
record MyData(int alpha, double beta, int foo, double bar) {}
Set<MyData> set = new CustomSet<>(d -> {
record Key(int alpha, double beta) {}
return new Key(d.alpha(), d.beta());
});
set.add(new MyData(1, 1.0, 100, 1.23));
System.out.println(set.contains(new MyData(1, 1.0, -1, Double.NaN))); // true
Solutions for older Java versions without record are a bit more verbose, but the principle stays the same. If you don’t need maximum performance, you can also use List keys as they have a working equals and hashCode implementation:
// Java 8 compatible
Set<MyData> set = new CustomSet<>(d -> Arrays.asList(d.alpha(), d.beta()));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With