Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How specialized are the Stream implementations returned by the standard collections?

Stream is an interface so whenever one gets hold of a Stream object there are lots of implementation specific details hidden.

For example, take the following code:

List<String> list = new ArrayList<>();
...
int size = list.stream()
               .count();

Does it run in constant or linear time? Or this:

Set<String> set = new TreeSet<>();
...
set.stream()
   .sorted()
   .forEach(System.out::println);

Would that be O(n) or O(n log n)?

In general, how specialized are the streams implementations returned by the standard collections?

like image 525
aioobe Avatar asked Feb 16 '15 16:02

aioobe


2 Answers

Does it run in constant or linear time?

The current implementation runs in linear time:

public final long count() {
    return mapToLong(e -> 1L).sum();
}

But, this could be improved (there's an RFE for this somewhere) to run in constant time in some situations.

How? A stream is described by a stream source, zero or more intermediate operations, and a terminal operation (here, count() is the terminal operation). The stream implementation maintains a set of characteristics about the source, and knows how the characteristics are modified by the operations. For example, a stream backed by a Collection has the characteristic SIZED, whereas a stream backed by an Iterator is not sized. Similarly, the operation map() is size-preserving, but the operation filter() destroys any a priori knowledge of sized-ness. The stream implementation knows the composed characteristics of the pipeline before it starts the terminal operation, so it knows whether the source is sized and whether all stages are size-preserving, and in such cases, could simply ask the source for the size and bypass all the actual stream computation. (But the implementation in Java 8 does not happen to do this.)

Note that the streams need not be specialized to support this; the Collection classes create the stream with a Spliterator that knows its characteristics, so a specialized implementation for Collections is not needed, just updating the shared implementation to take advantage of this particular bit of information.

like image 130
Brian Goetz Avatar answered Nov 03 '22 00:11

Brian Goetz


The sorted() method does not change the size, so it would, in theory, be possible that a future implementation could do a chain of stream().sorted().count() in O(1) time.

Take a look at the [speedment] open source implementation of the Stream interface at https://github.com/speedment/speedment. These streams can introspect their own pipeline and optimize away one or several steps in the stream.

like image 33
Per-Åke Minborg Avatar answered Nov 02 '22 23:11

Per-Åke Minborg