I'm using the stream spliterator directly for the low-level operations in the library I'm writing. Recently I discovered very weird behavior when I take the stream spliterator and interleave tryAdvance/trySplit
calls. Here's a simple code which demonstrates the problem:
import java.util.Arrays;
import java.util.Spliterator;
public class SpliteratorBug {
public static void main(String[] args) {
Integer[][] input = { { 1 }, { 2, 3 }, { 4, 5, 6 }, { 7, 8 }, { 9 } };
Spliterator<Integer> spliterator = Arrays.stream(input).parallel()
.flatMap(Arrays::stream).spliterator();
spliterator.trySplit();
spliterator.tryAdvance(s -> {});
spliterator.trySplit();
spliterator.forEachRemaining(System.out::println);
}
}
The output is
5
6
9
As you can see, after flat-mapping I should get the ordered stream of consecutive numbers from 1
to 9
. I split the spliterator once, so it should jump to some intermediate location. Next I consume an element from it and split it one more time. After that I print all the remaining elements. I expect that I will have several consecutive elements from the stream tail (probably zero elements, it would also be fine). However what I get is 5
and 6
, then sudden jump to 9
.
I know that currently in JDK spliterators are not used this way: they always split before the traversal. However official documentation does not explicitly forbid to call the trySplit
after tryAdvance
.
The problem was never observed when I use spliterator created directly from collection, array, generated source, etc. It's observed only if the spliterator was created from the parallel stream which had the intermediate flatMap
.
So the question is: did I hit the bug or it's explicitly forbidden somewhere to use the spliterator in this way?
A parallel stream is executed by different threads, running on multiple CPU cores in a computer. The stream elements are split into substreams that are processed by multiple instances of the stream pipeline being executed in multiple threads.
Parallel stream leverage multi-core processors, which increases its performance. Using parallel streams, our code gets divide into multiple streams which can be executed parallelly on separate cores of the system and the final result is shown as the combination of all the individual core's outcomes.
In the case of a sequential stream, the content of the list is printed in an ordered sequence. The output of the parallel stream, on the other hand, is unordered and the sequence changes every time the program is run.
Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g. This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.
From the documentation of Spliterator.trySplit()
:
This method may return
null
for any reason, including emptiness, inability to split after traversal has commenced, data structure constraints, and efficiency considerations.
(emphasis mine)
So the documentation explicitly mentions the possibility to attempt splitting after commencing traversal and suggests that spliterators which are unable to handle this may return null
.
So for ordered spliterators, the observed behavior should considered a bug as described by Misha. Generally, the fact that trySplit()
has to return a prefix spliterator, in other words, has to hand over all intermediate state regarding the next items to the new spliterator, is a peculiarity of the Spliterator
API that makes bugs likely. I took this question as a motive for checking my own spliterator implementations and found a similar bug…
From what I can see from the source of AbstractWrappingSpliterator
and company, when you tryAdvance
, the output of flatMap
(4,5,6) gets buffered and then 4 gets consumed leaving (5,6) in the buffer. Then trySplit
correctly splits off (7,8) to the new Spliterator
leaving 9 in old one but the buffered (5,6) stay with the old Spliterator
.
So this looks like a bug to me. It should either hand the buffer off to the new Spliterator
or return null
and refuse to split if the buffer is not empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With