Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guava Sets.intersection bad performance

I've encountered a strange problem in production today. Though I love Guava, I ran into a use case where Guava's Sets.intersection() was performing pretty badly. I've written a sample code:

Set<Long> cache = new HashSet<>();
for (long i = 0; i < 1000000; i++) {
    cache.add(i);
}
Set<Long> keys = new HashSet<>();
for (long i = 0; i < 100; i++) {
    keys.add(i);
}
long start = System.currentTimeMillis();
Set<Long> foundKeys = new HashSet<>();
for (Long key : keys) {
    if (cache.contains(key)) {
        foundKeys.add(key);
    }
}
System.out.println("Java search: " + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
SetView<Long> intersection = Sets.intersection(keys, cache);
System.out.println("Guava search: " + (System.currentTimeMillis() - start));

I've tried to create a similar production scenario where I've a keys cache, and I'm looking for all the keys present in the cache. Strangely, Guava search is taking a lot longer than Java search. After running this I got:

Java search: 0
Guava search: 36

Can anyone tell why this isn't suited for my use case or is there a bug in Guava?

like image 619
Heisenberg Avatar asked May 21 '15 12:05

Heisenberg


1 Answers

It turns out the problem was multiple calls to SetView.size(). As SetView is a (live) view of the intersection of the two sets, the intersection size needs to be recalculated every time.

public static <E> SetView<E> intersection( final Set<E> set1, final Set<?> set2) {
//...
  return new SetView<E>() {
    @Override public Iterator<E> iterator() {
      return Iterators.filter(set1.iterator(), inSet2);
    }
    @Override public int size() {
      return Iterators.size(iterator());
    }
    //...
  };
}

As can be seen here, what recalculation means in this case is iterating across the entire view, which can be quite time-consuming.

The way to get around this is therefore either to make sure size() is only called once and the value is stored (if you know the underlying sets won't change), or if that isn't possible, create a copy of the intersection via ImmutableSet.copyOf() (for example).

like image 119
biziclop Avatar answered Sep 28 '22 07:09

biziclop