As I see it, the obvious code, when using Java 8 Stream
s, whether they be "object" streams or primitive streams (that is, IntStream
and friends) would be to just use:
someStreamableResource.stream().whatever()
But then, quite a few "streamable resources" also have .parallelStream()
.
What isn't clear when reading the javadoc is whether .stream()
streams are always sequential, and whether .parallelStream()
streams are always parallel...
And then there is Spliterator
, and in particular its .characteristics()
, one of them being that it can be CONCURRENT
, or even IMMUTABLE
.
My gut feeling is that in fact, whether a Stream
can be, or not, parallel by default, or parallel at all, is guided by its underlying Spliterator
...
Am I on the right track? I have read, and read again, the javadocs, and still cannot come up with a clear answer to this question...
By default, any stream operation in Java is processed sequentially, unless explicitly specified as parallel.
Java Parallel Streams is a feature of Java 8 and higher, meant for utilizing multiple cores of the processor. Normally any java code has one stream of processing, where it is executed sequentially.
Parallel streams divide the provided task into many and run them in different threads, utilizing multiple cores of the computer. On the other hand sequential streams work just like for-loop using a single core.
First, through the lens of specification. Whether a stream is parallel or sequential is part of a stream's state. Stream-creation methods should specify whether they create a sequential or parallel stream (and most in the JDK do), but they are not required to say so. If your stream source doesn't say, don't assume. If someone passes you a stream, don't assume.
Parallel streams are allowed to fall back to sequential at their discretion (since a sequential implementation is a parallel implementation, just a potentially imperfect one); the opposite is not true.
Now, through the lens of implementation. In the stream-creation methods in Collections and other JDK classes, we stick to a discipline of "create a sequential stream unless the user explicitly asks for parallelism". (Other libraries, however, make different choices. If they're polite, they'll specify their behavior.)
The relationship between stream parallelism and Spliterator only goes in one direction. A Spliterator can refuse to split -- effectively denying any parallelism -- but it can't demand that a client split it. So an uncooperative Spliterator can undermine parallelism, but not determine it.
The API doesn't have much to say on the matter:
Streams are created with an initial choice of sequential or parallel execution. (For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one.)
Regarding your line of reasoning that some intermediate operations may not be thread safe, you may want to read the package summary. The package summary discusses intermediate operations, stateful vs stateless, and how to properly use a Stream
in some depth.
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
Behavioral parameters being the arguments given to stateless intermediate operations.
the API cannot make any assumptions
The API can make any assumption it wishes. The onus is on the user of the API to meet those assumptions. However, assumptions may limit usability. The Stream
API discourages the creation of a stateless intermediate operation that is not thread-safe. Since it is discouraged instead of prohibited, most Stream
s will be sequential "by default".
Well, answer to self...
After thinking about it a little more seriously (go figure, such things only happen after I actually ask the question), I actually came up with a reason why...
Intermediate operations may NOT be thread safe; as such, the API cannot make any assumptions, hence if the user wants a parallel stream, it has to explicitly ask for it and ensure that all intermediate operations used in the stream are thread safe.
There is however the somewhat misleading case of Collector
s; since a Collector
cannot know by advance whether it will be called as a terminal operation on a stream which is parallel or not, the contract makes it clear that "just to be safe", any Collector
must be thread safe.
It is mentioned here: "When you create a stream, it is always a serial stream unless otherwise specified."
And here: "It is allowable for this method (parallelStream
) to return a sequential stream."
CONCURRENT
and IMMUTABLE
aren't (directly) related to this. They specify whether the underlying collection can be modified without rendering the spliterator invalid or whether it is immutable respectively. The feature of spliterator that does pretty much define the behavior of parallelStream is trySplit
. Terminal operations on a parallel stream will eventually invoke trySplit
, and whatever that implementation does will in the end of the day define what parts, if any, of the data are processed in parallel.
This appart is not specification constrained right now, however the short answer is NO.
There exist parallelStream()
and stream()
functions but that just provides you ways to access to a parallel or sequential implementations of common basic operations to process the stream.
Currently runtime can't assume that your operations are thread safe without explicit usage of parallelStream()
or parallel()
call, then default implementation of stream()
is to have a sequential behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With