From the Stream javadoc:
Stream pipelines may execute either sequentially or in parallel. This execution mode is a property of the stream. Streams are created with an initial choice of sequential or parallel execution.
My assumptions:
Assuming these assumptions are valid (nothing wrong with a bit of meta-assumption), what's the value in having the execution mode exposed in the api?
It seems like you should just be able to declare a Stream
, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.
Sure, assuming parallel streams also work on a single core machine, perhaps just always using a parallel stream achieves this. But this is really ugly - why have explicit references to parallel streams in my code when it's the default option?
Even if there is a scenario where you'd deliberately want to hard code the use of a sequential stream - why is there not just a sub-interface SequentialStream
for that purpose, rather than polluting Stream
with an execution mode switch?
It seems like you should just be able to declare a Stream, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.
The reality is that a) streams are a library, and have no special JVM magic, and b) you can't really design a library smart enough to automagically figure out what the right decision is in this particular case. There's no sensible way to estimate how costly a particular function will be without running it -- even if you could introspect its implementation, which you can't -- and now you're introducing a benchmark into every stream operation, trying to figure out if parallelizing it will be worth the cost of the parallelism overhead. That's just not practical, especially given that you don't know in advance how bad the parallelism overhead is, either.
A parallel stream is always preferable, given appropriate number of cores and problem size to justify the overhead, due to the performance gains.
Not always, in practice. Some tasks are just so small that they're not worth parallelizing, and parallelism does always have some overhead. (And frankly, most programmers tend to overestimate the usefulness of parallelism, slapping it everywhere when it's really hurting performance.)
Basically, it's a hard enough problem that you basically have to shove it off onto the programmer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With