Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use Collectors.groupingByConcurrent?

I'm failing to understand the exact use case for Collectors.groupingByConcurrent. From the JavaDocs:

Returns a concurrent Collector implementing a cascaded "group by" operation on input elements of type T...
This is a concurrent and unordered Collector.
...

Maybe the keywords here are cascaded "group by". Does that point to something in how the actual accumulation is done by the collector? (looking at the source, it got intricate very quickly)


When I test it with a fake ConcurrentMap

class FakeConcurrentMap<K, V> extends HashMap<K, V> 
    implements ConcurrentMap<K, V> {}

I see that it breaks (gives wrong aggregations as the map isn't thread-safe) with parallel streams:

Map<Integer, Long> counts4 = IntStream.range(0, 1000000)
        .boxed()
        .parallel()
        .collect(
            Collectors.groupingByConcurrent(i -> i % 10, 
                                          FakeConcurrentMap::new, 
                                          Collectors.counting()));

Without .parallel(), results are consistently correct. So it seems that groupingByConcurrent goes with parallel streams.

But, as far as I can see, the following parallel stream collected with groupingBy always produces correct results:

Map<Integer, Long> counts3 = IntStream.range(0, 1000000)
        .boxed()
        .parallel()
        .collect(
            Collectors.groupingBy(i -> i % 10, 
                                  HashMap::new,
                                  Collectors.counting()));

So when is it correct to use groupingByConcurrent instead of groupingBy (surely that can't be just to get groupings as a concurrent map)?

like image 876
ernest_k Avatar asked Mar 02 '19 19:03

ernest_k


1 Answers

All Collectors work just fine for parallel streams, but Collectors supporting direct concurrency (with Collector.Characteristics.CONCURRENT) are eligible for optimizations that others are not. groupingByConcurrent falls into this category.

(Roughly, what happens is that a non-concurrent collector breaks the input into per-thread pieces, creates an accumulator per thread, and then merges them at the end. A concurrent (and unordered) collector creates one accumulator and has several worker threads concurrently merging elements into the same accumulator.)

like image 155
Louis Wasserman Avatar answered Sep 30 '22 18:09

Louis Wasserman