Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutation reductions for parallelStream in Java 8

Joshua Bloch in <Effective Java> (Third Edition) mentions that

The operations performed by Stream’s collect method, which are known as mutable reductions, are not good candidates for parallelism because the overhead of combining collections is costly.

I read the docs on Mutable reduction, but I am still not quite sure why reduction is not a good candidate for parallelism. Is it the synchronization?

As @Ravindra Ranwala points out (I also saw this on the Reduction, concurrency, and ordering docs):

It may actually be counterproductive to perform the operation in parallel. This is because the combining step (merging one Map into another by key) can be expensive for some Map implementations.

If so, then are there other major factors we need to care about that might result in low performance?

like image 497
Hearen Avatar asked Jun 15 '18 04:06

Hearen


1 Answers

No it's nothing to do with the synchronization. Consider you have a 1 million Person objects and need to find out all people who live in New York. So a typical stream pipeline would be,

people.parallelStream()
    .filter(p -> p.getState().equals("NY"))
    .collect(Collectors.toList());

Consider a parallel execution of this query. Let's say we have 10 threads executing it in parallel. Each thread will accumulate it's own data set into a separate local container. Finally the 10 result containers are merged to form one large container. This merge will be costly and is an additional step introduced by the parallel execution. Hence parallel execution may not always be faster. Some times sequential execution may be faster than it's parallel counter part.

So always start with a sequential execution. If that makes sense only, you may fall back to it's parallel counterpart at some later point in time.

like image 178
Ravindra Ranwala Avatar answered Sep 24 '22 14:09

Ravindra Ranwala