Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior of Stream.spliterator for parallel streams

I'm using the stream spliterator directly for the low-level operations in the library I'm writing. Recently I discovered very weird behavior when I take the stream spliterator and interleave tryAdvance/trySplit calls. Here's a simple code which demonstrates the problem:

import java.util.Arrays;
import java.util.Spliterator;

public class SpliteratorBug {
    public static void main(String[] args) {
        Integer[][] input = { { 1 }, { 2, 3 }, { 4, 5, 6 }, { 7, 8 }, { 9 } };
        Spliterator<Integer> spliterator = Arrays.stream(input).parallel()
                .flatMap(Arrays::stream).spliterator();
        spliterator.trySplit();
        spliterator.tryAdvance(s -> {});
        spliterator.trySplit();
        spliterator.forEachRemaining(System.out::println);
    }
}

The output is

5
6
9

As you can see, after flat-mapping I should get the ordered stream of consecutive numbers from 1 to 9. I split the spliterator once, so it should jump to some intermediate location. Next I consume an element from it and split it one more time. After that I print all the remaining elements. I expect that I will have several consecutive elements from the stream tail (probably zero elements, it would also be fine). However what I get is 5 and 6, then sudden jump to 9.

I know that currently in JDK spliterators are not used this way: they always split before the traversal. However official documentation does not explicitly forbid to call the trySplit after tryAdvance.

The problem was never observed when I use spliterator created directly from collection, array, generated source, etc. It's observed only if the spliterator was created from the parallel stream which had the intermediate flatMap.

So the question is: did I hit the bug or it's explicitly forbidden somewhere to use the spliterator in this way?

like image 901
Tagir Valeev Avatar asked Jul 01 '15 04:07

Tagir Valeev


People also ask

What are the features of parallel stream?

A parallel stream is executed by different threads, running on multiple CPU cores in a computer. The stream elements are split into substreams that are processed by multiple instances of the stream pipeline being executed in multiple threads.

Which is better stream or parallel stream?

Parallel stream leverage multi-core processors, which increases its performance. Using parallel streams, our code gets divide into multiple streams which can be executed parallelly on separate cores of the system and the final result is shown as the combination of all the individual core's outcomes.

What is difference between parallel stream and normal stream?

In the case of a sequential stream, the content of the list is printed in an ordered sequence. The output of the parallel stream, on the other hand, is unordered and the sequence changes every time the program is run.

When should you avoid parallel stream?

Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g. This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.


2 Answers

From the documentation of Spliterator.trySplit():

This method may return null for any reason, including emptiness, inability to split after traversal has commenced, data structure constraints, and efficiency considerations.

(emphasis mine)

So the documentation explicitly mentions the possibility to attempt splitting after commencing traversal and suggests that spliterators which are unable to handle this may return null.

So for ordered spliterators, the observed behavior should considered a bug as described by Misha. Generally, the fact that trySplit() has to return a prefix spliterator, in other words, has to hand over all intermediate state regarding the next items to the new spliterator, is a peculiarity of the Spliterator API that makes bugs likely. I took this question as a motive for checking my own spliterator implementations and found a similar bug…

like image 184
Holger Avatar answered Sep 28 '22 08:09

Holger


From what I can see from the source of AbstractWrappingSpliterator and company, when you tryAdvance, the output of flatMap (4,5,6) gets buffered and then 4 gets consumed leaving (5,6) in the buffer. Then trySplit correctly splits off (7,8) to the new Spliterator leaving 9 in old one but the buffered (5,6) stay with the old Spliterator.

So this looks like a bug to me. It should either hand the buffer off to the new Spliterator or return null and refuse to split if the buffer is not empty.

like image 27
Misha Avatar answered Sep 28 '22 09:09

Misha