Joshua Bloch in <Effective Java> (Third Edition)
mentions that
The operations performed by Stream’s collect method, which are known as mutable reductions, are not good candidates for parallelism because the overhead of combining collections is costly.
I read the docs on Mutable reduction, but I am still not quite sure why reduction is not a good candidate for parallelism. Is it the synchronization
?
As @Ravindra Ranwala points out (I also saw this on the Reduction, concurrency, and ordering docs):
It may actually be counterproductive to perform the operation in parallel. This is because the combining step (merging one Map into another by key) can be expensive for some Map implementations.
If so, then are there other major factors we need to care about that might result in low performance?
No it's nothing to do with the synchronization
. Consider you have a 1 million Person
objects and need to find out all people
who live in New York. So a typical stream pipeline would be,
people.parallelStream()
.filter(p -> p.getState().equals("NY"))
.collect(Collectors.toList());
Consider a parallel execution of this query. Let's say we have 10 threads executing it in parallel. Each thread will accumulate it's own data set into a separate local container. Finally the 10 result containers are merged to form one large container. This merge will be costly and is an additional step introduced by the parallel execution. Hence parallel execution may not always be faster. Some times sequential execution may be faster than it's parallel counter part.
So always start with a sequential execution. If that makes sense only, you may fall back to it's parallel counterpart at some later point in time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With