Why should I use concurrent characteristic in parallel stream with collect: <pre class="prettyprint"><code>List<Integer> list = Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4))); Map<Integer, Integer> collect = list.stream().parallel() .collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2)); </code></pre> And not: <pre class="prettyprint"><code>Map<Integer, Integer> collect = list.stream().parallel() .collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2)); </code></pre> In other words, what are the side effects to not using this characteristic, Is it useful for the internal stream operations?

First of all I gave a +1 to Holger's answer, it is a good one. I would try to simply it just a bit, by saying that : CONCURRENT -> multiple threads throw data at the same container in no particular order (ConcurrentHashMap) NON-CONCURRENT -> multiple threads combine their intermediate results. The easiest way to understand it (IMHO) is to write a custom collector and play with each of it's methods: supplier, accumulator, combiner. This was already sort-of covered here

Why should I use concurrent characteristic in parallel stream with collect?

Tags:

java

multithreading

concurrency

java-8

java-stream

Why should I use concurrent characteristic in parallel stream with collect:

Click to copy

List<Integer> list =
        Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4)));

Map<Integer, Integer> collect = list.stream().parallel()
        .collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2));

And not:

Click to copy

Map<Integer, Integer> collect = list.stream().parallel()
        .collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2));

In other words, what are the side effects to not using this characteristic, Is it useful for the internal stream operations?

372

asked Dec 08 '16 14:12

heaprc

2 Answers

These two collectors operate in a fundamentally different way.

First, the Stream framework will split the workload into independent chunks that can be processed in parallel (that’s why you don’t need a special collection as the source, synchronizedList is unnecessary).

With a non-concurrent collector, each chunk will be processed by creating a local container (here, a Map) using the Collector’s supplier and accumulating it into the local container (putting entries). These partial results have to be merged, i.e. one map has been put into the other, to get a final result.

A concurrent collector supports accumulating concurrently, so only one ConcurrentMap will be created and all threads accumulate into that map at the same time. So after completion, no merging step is required, as there is only one map.

So both collectors are thread-safe, but might exhibit entirely different performance characteristics, depending on the task. If the Stream’s workload before collecting the result is heavy, the differences might be negligible. If like in your example, there is no relevant work before the collect operation, the outcome heavily depends on how often mappings have to be merged, i.e the same key occurs, and how the actual target ConcurrentMap deals with contention in the concurrent case.

If you mostly have distinct keys, the merging step of a non-concurrent collector can be as expensive as the previous putting, destroying any benefit of the parallel processing. But if you have lots of duplicate keys, requiring merging of the values, the contention on the same key may degrade the concurrent collector’s performance.

So there’s no simple “which is better” answer (well, if there was such an answer, why bother adding the other variant). It depends on your actual operation. You can use the expected scenario as a starting point for selecting one but should measure with the real-life data then. Since both are equivalent, you can change your choice at any time.

142

answered Nov 15 '22 20:11

Holger

First of all I gave a +1 to Holger's answer, it is a good one. I would try to simply it just a bit, by saying that :

CONCURRENT -> multiple threads throw data at the same container in no particular order (ConcurrentHashMap)

NON-CONCURRENT -> multiple threads combine their intermediate results.

The easiest way to understand it (IMHO) is to write a custom collector and play with each of it's methods: supplier, accumulator, combiner.

This was already sort-of covered here

answered Nov 15 '22 20:11

Eugene

Related questions
                            
                                Java Library to read Microsoft Excel files [duplicate]
                            
                                Undertow how to do Non-blocking IO?
                            
                                Sequence Generator in Java for Unique Id
                            
                                Java: add two objects
                            
                                Get the height of a node in JavaFX (generate a layout pass)
                            
                                JavaMail Exception javax.mail.AuthenticationFailedException 534-5.7.9 Application-specific password required
                            
                                Splitting and converting String to int
                            
                                How to Mock instanceof in Mockito
                            
                                Java, ConcurrentLinkedDeque vs ConcurrentLinkedQueue - the difference?
                            
                                Open browser with a url with extra headers for Android
                            
                                @PropertySource in a Jar for an external file on the classpath
                            
                                Thymeleaf: replace newline characters with <br>
                            
                                How to download image using rest template?
                            
                                Meaning of R.layout.activity_main in android development (JAVA language)
                            
                                Spring Boot: Change property placeholder signifier
                            
                                How to fetch EntityGraph dynamically in Spring Boot
                            
                                How to check if a string is date?
                            
                                The onQueryTextSubmit in SearchView is processed twice in Android Java
                            
                                Does an initialized Array retain its order?
                            
                                getPosition Deprecated What should I use instead?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With