Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream spliterator implementation detail

While looking into the source code of the WrappingSpliterator::trySplit, I was very mislead by it's implementation:

    @Override
    public Spliterator<P_OUT> trySplit() {
        if (isParallel && buffer == null && !finished) {
            init();

            Spliterator<P_IN> split = spliterator.trySplit();
            return (split == null) ? null : wrap(split);
        }
        else
            return null;
    }

And if you are wondering why this matters, is because for example this:

Arrays.asList(1,2,3,4,5)
      .stream()
      .filter(x -> x != 1)
      .spliterator();

is using it. In my understanding the addition of any intermediate operation to a stream, will cause that code to be triggered.

Basically this method says that unless the stream is parallel, treat this Spliterator as one that can not be split, at all. And this matters to me. In one of my methods (this is how I got to that code), I get a Stream as input and "parse" it in smaller pieces, manually, with trySplit. You can think for example that I am trying to do a findLast from a Stream.

And this is where my desire to split in smaller chunks is nuked, because as soon as I do:

Spliterator<T> sp = stream.spliterator();
Spliterator<T> prefixSplit = sp.trySplit();

I find out that prefixSplit is null, meaning that I basically can't do anything else other than consume the entire sp with forEachRemaning.

And this is a bit weird, may be it makes some sense for when filter is present; because in this case the only way (in my understanding) a Spliterator could be returned is using some kind of a buffer, may be even with a predefined size (much like Files::lines). But why this:

Arrays.asList(1,2,3,4)
      .stream()
      .sorted()
      .spliterator()
      .trySplit();

returns null is something I don't understand. sorted is a stateful operation that buffers the elements anyway, without actually reducing or increasing their initial number, so at least theoretically this can return something other than null...

like image 241
Eugene Avatar asked Apr 02 '19 20:04

Eugene


1 Answers

When you invoke spliterator() on a Stream, there are only two possible outcomes with the current implementation.

If the stream has no intermediate operations you’ll get the source spliterator that has been used to construct the stream and whose splitting capability is entirely independent from the stream’s parallel state, as in fact, the spliterator doesn’t know anything about the stream.

Otherwise, you’ll get a WrappingSpliterator, which will encapsulate a source Spliterator and a pipeline state, expressed as PipelineHelper. This combination of Spliterator and PipelineHelper does not need to work in parallel and, in fact, would not work in case of distinct(), as the WrappingSpliterator will get an entirely different combination, depending on whether the Stream is parallel or not.

For stateless intermediate operations, it would not make a difference though. But, as discussed in “Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?”, the WrappingSpliterator is a “one-fits-all implementation” that doesn’t consider the actual nature of the pipeline, so its limitations are the superset of all possible limitations of all supported pipeline stages. So the existence of one scenario that wouldn’t work when ignoring the parallel flag is enough to forbid splitting for all pipelines when not being parallel.

like image 77
Holger Avatar answered Oct 18 '22 00:10

Holger