In Aggregating with Streams, Brian Goetz compares populating a collection using Stream.collect() and doing the same using Stream.forEach(), with the following two snippets:
Set<String> uniqueStrings = strings.stream()
.collect(HashSet::new,
HashSet::add,
HashSet::addAll);
And,
Set<String> set = new HashSet<>();
strings.stream().forEach(s -> set.add(s));
Then he explains:
The key difference is that, with the forEach() version, multiple threads are trying to access a single result container simultaneously, whereas with parallel collect(), each thread has its own local result container, the results of which are merged afterward.
To my understanding, multiple threads would be working in the forEach() case only if the stream is parallel. However, in the example given, forEach() is operating on a sequential stream (no call to parallelStream()).
So, is it that forEach() always work in parallel, or that the code snippet should call parallelStream() instead of stream(). (or that I'm missing something?)
No, forEach() doesn't parallelize if the stream isn't parallel. I think he simplified the example for the sake of discussion.
As evidence, this code is inside the AbstractPipeline class's evaluate method (which is called from forEach)
return isParallel()
? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
: terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
The whole quote goes as follows:
Just as reduction can parallelize safely provided the combining function is associative and free of interfering side effects, mutable reduction with
Stream.collect()
can parallelize safely if it meets certain simple consistency requirements (outlined in the specification forcollect()
).
And then what you've quoted:
The key difference is that, with the
forEach()
version, multiple threads are trying to access a single result container simultaneously, whereas with parallelcollect()
, each thread has its own local result container, the results of which are merged afterward.
Since the first sentence clearly speaks of parallelization, my understanding is that both forEach()
and collect()
are spoken of in the context of parallel streams.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With