Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently determining whether or not two collections have any items in common in Java

I know that, in Java, I can manually determine if two collections have any overlap by turning one of them into a set then iterating over the other doing contains checks:

<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
    for (T item : collection1)
        if (collection2.contains(item))
            return true;
    return false;
}

or alternatively:

<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
    return collection1.stream().anyMatch(collection2::contains);
}

But are there existing utility methods that do this and intelligently choose which collection to iterate, which to turn into a set, take advantage of one already being a set, etc? I know that Guava has Sets.intersection, but it computes the entire intersection instead of just whether or not it's empty.

Note that I'd prefer the comparison to short-circuit as soon as any common item is found. Checking if two huge collections have overlap should take time proportional to the number of non-overlapping items (or better), instead of the total number of items.

like image 570
Craig Gidney Avatar asked Nov 01 '22 16:11

Craig Gidney


1 Answers

Partial answer for when the collections are already Sets.

Sets.intersection is actually closer to what I wanted than I thought because its result is not precomputed. Instead, it's a view of the intersection that's computed on the fly.

Take a look at the anonymous class returned by intersection:

final Predicate<Object> inSet2 = Predicates.in(set2);
return new SetView<E>() {
  @Override public Iterator<E> iterator() {
    return Iterators.filter(set1.iterator(), inSet2);
  }
  @Override public int size() {
    return Iterators.size(iterator());
  }
  @Override public boolean isEmpty() {
    return !iterator().hasNext();
  }
  @Override public boolean contains(Object object) {
    return set1.contains(object) && set2.contains(object);
  }
  @Override public boolean containsAll(Collection<?> collection) {
    return set1.containsAll(collection)
        && set2.containsAll(collection);
  }
};

The isEmpty method doesn't go over every item. Instead, it iterates over the first set while checking if items are in the second set. As soon as it finds one, it returns true. If you're unlucky you'll iterate all the items in set1 that aren't in set2 first, but that's probably unavoidable and better than always iterating all items.

In other words, if you already have sets, an efficient solution that short-circuits appropriately is just:

boolean overlaps = !Sets.intersections(set1, set2).isEmpty();

This won't iterate over the smaller set instead of the larger set, or deal with non-set collections, but it's often useful.

like image 138
Craig Gidney Avatar answered Nov 09 '22 08:11

Craig Gidney