Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Stream.forEach() always work in parallel?

In Aggregating with Streams, Brian Goetz compares populating a collection using Stream.collect() and doing the same using Stream.forEach(), with the following two snippets:

Set<String> uniqueStrings = strings.stream()
                                   .collect(HashSet::new,
                                            HashSet::add,
                                            HashSet::addAll);

And,

Set<String> set = new HashSet<>();
strings.stream().forEach(s -> set.add(s));

Then he explains:

The key difference is that, with the forEach() version, multiple threads are trying to access a single result container simultaneously, whereas with parallel collect(), each thread has its own local result container, the results of which are merged afterward.

To my understanding, multiple threads would be working in the forEach() case only if the stream is parallel. However, in the example given, forEach() is operating on a sequential stream (no call to parallelStream()).

So, is it that forEach() always work in parallel, or that the code snippet should call parallelStream() instead of stream(). (or that I'm missing something?)

like image 851
ARX Avatar asked Jan 21 '17 22:01

ARX


2 Answers

No, forEach() doesn't parallelize if the stream isn't parallel. I think he simplified the example for the sake of discussion.

As evidence, this code is inside the AbstractPipeline class's evaluate method (which is called from forEach)

 return isParallel()
               ? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
               : terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
like image 125
Jeanne Boyarsky Avatar answered Oct 29 '22 23:10

Jeanne Boyarsky


The whole quote goes as follows:

Just as reduction can parallelize safely provided the combining function is associative and free of interfering side effects, mutable reduction with Stream.collect() can parallelize safely if it meets certain simple consistency requirements (outlined in the specification for collect()).

And then what you've quoted:

The key difference is that, with the forEach() version, multiple threads are trying to access a single result container simultaneously, whereas with parallel collect(), each thread has its own local result container, the results of which are merged afterward.

Since the first sentence clearly speaks of parallelization, my understanding is that both forEach() and collect() are spoken of in the context of parallel streams.

like image 45
lexicore Avatar answered Oct 29 '22 23:10

lexicore