Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Stream API - Does any stateful intermediate operation guarantee a new source collection?

Is the following statement true?

The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.

(Source and source - they seem to copy from each other or come from the same source.)

Disclaimer: I am aware the following snippets are not legit usages of Java Stream API. Don't use in the production code.

I have tested Stream::sorted as a snippet from sources above:

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());      list.stream()     .filter(i -> i > 5)     .sorted()     .forEach(list::remove);  System.out.println(list);            // Prints [0, 1, 2, 3, 4, 5] 

It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());      list.stream()     .filter(i -> i > 5)     .distinct()     .forEach(list::remove);          // Throws NullPointerException 

To my surprise, the NullPointerException is thrown.

All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.

Where my confusion comes from and what is the explanation of the behavior above?

like image 859
Nikolas Charalambidis Avatar asked Sep 11 '18 09:09

Nikolas Charalambidis


People also ask

Which of the following operation of Java Util stream package is stateful intermediate operation?

Stateful intermediate operations Those intermediate operations are distinct(), sorted(), limit(), skip(). All other operations are stateless.

What are the advantages of stream API in Java 8?

There are a lot of benefits to using streams in Java, such as the ability to write functions at a more abstract level which can reduce code bugs, compact functions into fewer and more readable lines of code, and the ease they offer for parallelization.


1 Answers

The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.

Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.

To show a counter-example

Set<Integer> set = IntStream.range(0, 10).boxed()     .collect(Collectors.toCollection(TreeSet::new)); set.stream()     .filter(i -> i > 5)     .sorted()     .forEach(set::remove); 

throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.

There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.

So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.

like image 72
Holger Avatar answered Sep 23 '22 12:09

Holger