Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a Java 8 `Stream` be parallel without you even asking for it?

As I see it, the obvious code, when using Java 8 Streams, whether they be "object" streams or primitive streams (that is, IntStream and friends) would be to just use:

someStreamableResource.stream().whatever()

But then, quite a few "streamable resources" also have .parallelStream().

What isn't clear when reading the javadoc is whether .stream() streams are always sequential, and whether .parallelStream() streams are always parallel...

And then there is Spliterator, and in particular its .characteristics(), one of them being that it can be CONCURRENT, or even IMMUTABLE.

My gut feeling is that in fact, whether a Stream can be, or not, parallel by default, or parallel at all, is guided by its underlying Spliterator...

Am I on the right track? I have read, and read again, the javadocs, and still cannot come up with a clear answer to this question...

like image 624
fge Avatar asked Jan 14 '15 01:01

fge


People also ask

Is Java stream parallel by default?

By default, any stream operation in Java is processed sequentially, unless explicitly specified as parallel.

Is Java 8 stream parallel?

Java Parallel Streams is a feature of Java 8 and higher, meant for utilizing multiple cores of the processor. Normally any java code has one stream of processing, where it is executed sequentially.

Is Java 8 supports parallel and sequential stream True or false?

Parallel streams divide the provided task into many and run them in different threads, utilizing multiple cores of the computer. On the other hand sequential streams work just like for-loop using a single core.


5 Answers

First, through the lens of specification. Whether a stream is parallel or sequential is part of a stream's state. Stream-creation methods should specify whether they create a sequential or parallel stream (and most in the JDK do), but they are not required to say so. If your stream source doesn't say, don't assume. If someone passes you a stream, don't assume.

Parallel streams are allowed to fall back to sequential at their discretion (since a sequential implementation is a parallel implementation, just a potentially imperfect one); the opposite is not true.

Now, through the lens of implementation. In the stream-creation methods in Collections and other JDK classes, we stick to a discipline of "create a sequential stream unless the user explicitly asks for parallelism". (Other libraries, however, make different choices. If they're polite, they'll specify their behavior.)

The relationship between stream parallelism and Spliterator only goes in one direction. A Spliterator can refuse to split -- effectively denying any parallelism -- but it can't demand that a client split it. So an uncooperative Spliterator can undermine parallelism, but not determine it.

like image 134
Brian Goetz Avatar answered Sep 30 '22 01:09

Brian Goetz


The API doesn't have much to say on the matter:

Streams are created with an initial choice of sequential or parallel execution. (For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one.)

Regarding your line of reasoning that some intermediate operations may not be thread safe, you may want to read the package summary. The package summary discusses intermediate operations, stateful vs stateless, and how to properly use a Stream in some depth.

Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.

Behavioral parameters being the arguments given to stateless intermediate operations.

the API cannot make any assumptions

The API can make any assumption it wishes. The onus is on the user of the API to meet those assumptions. However, assumptions may limit usability. The Stream API discourages the creation of a stateless intermediate operation that is not thread-safe. Since it is discouraged instead of prohibited, most Streams will be sequential "by default".

like image 20
Jeffrey Avatar answered Sep 30 '22 01:09

Jeffrey


Well, answer to self...

After thinking about it a little more seriously (go figure, such things only happen after I actually ask the question), I actually came up with a reason why...

Intermediate operations may NOT be thread safe; as such, the API cannot make any assumptions, hence if the user wants a parallel stream, it has to explicitly ask for it and ensure that all intermediate operations used in the stream are thread safe.

There is however the somewhat misleading case of Collectors; since a Collector cannot know by advance whether it will be called as a terminal operation on a stream which is parallel or not, the contract makes it clear that "just to be safe", any Collector must be thread safe.

like image 24
fge Avatar answered Sep 30 '22 03:09

fge


It is mentioned here: "When you create a stream, it is always a serial stream unless otherwise specified." And here: "It is allowable for this method (parallelStream) to return a sequential stream."

CONCURRENT and IMMUTABLE aren't (directly) related to this. They specify whether the underlying collection can be modified without rendering the spliterator invalid or whether it is immutable respectively. The feature of spliterator that does pretty much define the behavior of parallelStream is trySplit. Terminal operations on a parallel stream will eventually invoke trySplit, and whatever that implementation does will in the end of the day define what parts, if any, of the data are processed in parallel.

like image 43
Dima Avatar answered Sep 30 '22 03:09

Dima


This appart is not specification constrained right now, however the short answer is NO. There exist parallelStream() and stream() functions but that just provides you ways to access to a parallel or sequential implementations of common basic operations to process the stream. Currently runtime can't assume that your operations are thread safe without explicit usage of parallelStream() or parallel() call, then default implementation of stream() is to have a sequential behavior.

like image 36
Jairo Andres Velasco Romero Avatar answered Sep 30 '22 03:09

Jairo Andres Velasco Romero