Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the specification guarantee that operations on sequential Java streams have to stay in the current thread?

Does the specification guarantee, that all operations on sequential Java Streams are executed in the current thread? (Except for "forEach" and "forEachOrdered")

I explicitly ask for the specification, not what the current implementation does. I can look into the current implementation myself and don't need to bother you with that. But the implementation might change and there are other implementations.

I'm asking because of ThreadLocals: I use a Framework which uses ThreadLocals internally. Even a simple call like company.getName() eventually uses a ThreadLocal. I cannot change how that framework is designed. At least not within a sane amount of time.

The specification seems confusing here. The documentation of the Package "java.util.stream" states:

If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to the visibility of those side-effects to other threads, nor are there any guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread.

...

Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.

I would interpret that as: Every operation on a stream can happen in a different thread. But the documentation of "forEach" and "forEachOrdered" explicitly states:

For any given element, the action may be performed at whatever time and in whatever thread the library chooses.

That statement would be redundant if every stream operation could happen in an unspecified thread. Is therefore the opposite true: All operations on a serial stream are guaranteed to be executed in the current thread, except for "forEach" and "forEachOrdered"?

I have googled for an authoritative answer about the combination of "Java", "Stream" and "ThreadLocal" but found nothing. The closes thing was an answer by Brian Goetz to a related question here on Stack Overflow, but it is about the order, not the thread, and it is only about "forEach", not the other stream methods: Does Stream.forEach respect the encounter order of sequential streams?

like image 485
user194860 Avatar asked May 23 '18 14:05

user194860


People also ask

Is Java stream sequential by default?

stream() streams are always sequential, and whether . parallelStream() streams are always parallel... And then there is Spliterator , and in particular its . characteristics() , one of them being that it can be CONCURRENT , or even IMMUTABLE .

What is true about parallel streams?

Normally any java code has one stream of processing, where it is executed sequentially. Whereas by using parallel streams, we can divide the code into multiple streams that are executed in parallel on separate cores and the final result is the combination of the individual outcomes.

What does parallel and sequential stream do to increase performance?

A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed. The code used in this POC can be found on GitHub.


1 Answers

I believe the answer you are looking for is not so well defined, as it will depends on the consumer and/or spliterator and their characteristics:

Before reading the main quote:

https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#stream

default Stream stream() Returns a sequential Stream with this collection as its source. This method should be overridden when the spliterator() method cannot return a spliterator that is IMMUTABLE, CONCURRENT, or late-binding. (See spliterator() for details.)

https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#binding

Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.

Spliterators and consumers have their on set of characteristics, and that will define the guarantee. Let's suppose you are operating in a streem. As spliterators are supposed not to be thread safe and supposed to handle elements to other spliterators that might be in other thread, been sequencial or not, there guarantee is null. However, if no splits occor the quotes will lead to the following: under one spliterator, the operations will remain in the same thread, any event that leads to a split will cause the assumption to be null, but true otherwise

like image 125
Victor Avatar answered Oct 26 '22 09:10

Victor