In order to try to deeply understand java streams and spliterators, I have some subtle questions about spliterator characteristics:
Q1: Stream.empty()
vs Stream.of()
(Stream.of() without args)
Stream.empty()
: SUBSIZED, SIZED Stream.of()
: SUBSIZED, IMMUTABLE, SIZED, ORDERED Why Stream.empty()
doesn't have the same characteristics of Stream.of()
? Note that it has impacts when using in conjunction with Stream.concat() (specially not having ORDERED
). I would say that Stream.empty()
should have not just IMMUTABLE and ORDERED but also DISTINCT and NONNULL. Also it make sense Stream.of()
with only one argument having DISTICT.
Q2: LongStream.of()
not having NONNULL
Just noticed that NONNULL is not available in LongStream.of
. Isn't NONNULL
a main characteristics of all LongStream
s, IntStream
s and DoubleStream
s?
Q3: LongStream.range(,)
vs LongStream.range(,).boxed()
LongRange.range(,)
: SUBSIZED, IMMUTABLE, NONNULL, SIZED, ORDERED, SORTED, DISTINCT LongStream.range(,).boxed()
: SUBSIZED, SIZED, ORDERED Why .boxed()
loses all these characteristics? It shouldn't lose any.
I understand that .mapToObj()
can lose the NONNULL, IMMUTABLE and DISTICT, but .boxed()
... doesn't make sense.
Q4: .peek()
loses IMMUTABLE and NONNULL
LongStream.of(1)
: SUBSIZED, IMMUTABLE, NONNULL, SIZED, ... LongStream.of(1).peek()
: SUBSIZED, SIZED, ...
Why .peek()
loses these characteristics? .peek
shouldn't really lose any.
Q5: .skip()
, .limit()
loses SUBSIZED, IMMUTABLE, NONNULL, SIZED
Just notice that these operations loses SUBSIZED, IMMUTABLE, NONNULL, SIZED. Why? If the size is available, then it's easy to calculate the final size as well.
Q6: .filter()
loses IMMUTABLE, NONNULL
Just notice that this operations loses as well SUBSIZED, IMMUTABLE, NONNULL, SIZED. It make sense to lose SUBSIZED and SIZED, but the others two doesn't make sense. Why?
I will appreciate if someone that understands deeply the spliterator could bring some clarity. Thanks.
Spliterator has been introduced in Java 8. It provides support for parallel processing of stream of elements for any collection. It provides tryAdvance() method to iterate elements individually in different threads. It helps in parallel processing.
Like Iterator and ListIterator, Spliterator is a Java Iterator, which is used to iterate elements one-by-one from a List implemented object. Some important points about Java Spliterator are: Java Spliterator is an interface in Java Collection API. Spliterator is introduced in Java 8 release in java.
Spliterator is an internal iterator that breaks the stream into the smaller parts. These smaller parts can be processed in parallel.
I have to admit that I had difficulties too when I first tried to find out the actual meaning of the characteristics and had the feeling that their meaning was not clearly settled during the implementation phase of Java 8 and is used inconsistently for that reason.
Consider Spliterator.IMMUTABLE
:
Characteristic value signifying that the element source cannot be structurally modified; that is, elements cannot be added, replaced, or removed, so such changes cannot occur during traversal.
It’s strange to see “replaced” in this list, which is usually not considered a structural modification when speaking of a List
or an array and consequently, stream and spliterator factories accepting an array (that is not cloned) report IMMUTABLE
, like LongStream.of(…)
or Arrays.spliterator(long[])
.
If we interpret this more generously as “as long as not observable by the client”, there is no significant difference to CONCURRENT
, as in either case some elements will be reported to the client without any way to recognize whether they were added during traversal or whether some were unreported due to removal, as there is no way to rewind a spliterator and compare.
The specification continues:
A Spliterator that does not report
IMMUTABLE
orCONCURRENT
is expected to have a documented policy (for example throwingConcurrentModificationException
) concerning structural interference detected during traversal.
And that’s the only relevant thing, a spliterator reporting either, IMMUTABLE
or CONCURRENT
, is guaranteed to never throw a ConcurrentModificationException
. Of course, CONCURRENT
precludes SIZED
semantically, but that has no consequence to the client code.
In fact, these characteristics are not used for anything in the Stream API, hence, using them inconsistently would never get noticed somewhere.
This is also the explanation why every intermediate operation has the effect of clearing the CONCURRENT
, IMMUTABLE
and NONNULL
characteristics: the Stream implementation doesn’t use them and its internal classes representing the stream state do not maintain them.
Likewise, NONNULL
is not used anywhere, so it’s absence for certain streams has no effect. I could track down the LongStream.of(…)
issue down to the internal use of Arrays.spliterator(long[], int, int)
which delegates toSpliterators.spliterator(long[] array, int fromIndex, int toIndex, int additionalCharacteristics)
:
The returned spliterator always reports the characteristics
SIZED
andSUBSIZED
. The caller may provide additional characteristics for the spliterator to report. (For example, if it is known the array will not be further modified, specifyIMMUTABLE
; if the array data is considered to have an encounter order, specifyORDERED
). The methodArrays.spliterator(long[], int, int)
can often be used instead, which returns a spliterator that reportsSIZED
,SUBSIZED
,IMMUTABLE
, andORDERED
.
Note (again) the inconsistent use of the IMMUTABLE
characteristic. It is again treated like having to guaranty the absence of any modification, while at the same time, Arrays.spliterator
and in turn Arrays.stream
and LongStream.of(…)
will report the IMMUTABLE
characteristic, even by specification, without being able to guaranty that the caller won’t modify their array. Unless we consider setting an element not to be a structural modification, but then, the entire distinction becomes nonsensical again, as arrays can’t be structurally modified.
And it clearly specified no NONNULL
characteristic. While it is obvious that primitive values can’t be null
, and the Spliterator.Abstract<Primitive>Spliterator
classes invariably inject a NONNULL
characteristic, the spliterator returned by Spliterators.spliterator(long[],int,int,int)
does not inherit from Spliterator.AbstractLongSpliterator
.
The bad thing is, this can’t be fixed without changing the specification, the good thing is, it has no consequences anyway.
So if we ignore any issues with CONCURRENT
, IMMUTABLE
, or NONNULL
, which have no consequences, we have
SIZED
and skip
& limit
. This is a well known issue, result of the way skip
and limit
have been implemented by the Stream API. Other implementations are imaginable. This also applies to the combination of an infinite stream with a limit
, which should have a predictable size, but given the current implementation, has not.
Combining Stream.concat(…)
with Stream.empty()
. It sounds reasonable that an empty stream does not impose constraints on the result order. But Stream.concat(…)
’s behavior of releasing the order when only one input has no order, is questionable. Note that being too aggressive regarding ordering is nothing new, see this Q&A regarding a behavior that was considered intentional first, but then has been fixed as late as Java 8, update 60. Perhaps, Stream.concat
should have been discussed right at this point of time too…
The behavior of .boxed()
is easy to explain. When it has been implemented naively like .mapToObj(Long::valueOf)
, it will simply loose all the knowledge, as mapToObj
can not assume that the result is still sorted or distinct. But this has been fixed with Java 9. There, LongStream.range(0,10).boxed()
has SUBSIZED|SIZED|ORDERED|SORTED|DISTINCT
characteristics, maintaining all characteristics that have relevance to the implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With