Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance of Stream.sorted().limit()

Java Streams sport both sorted and limit methods, which respectively return a sorted version of a stream and return a stream just returning a specified number of items of a stream. When these operations are applied in succession, such as in:

stream.sorted().limit(qty).collect(Collectors.toList())

is the sorting is performed in a way that sorts qty items or is the entire list sorted? In other words, if qty is fixed, is this operation in O(n)? The documentation doesn't specify the performance of these methods alone or in conjunction with each other.

The reason I ask is that the obvious imperative implementation of these operations would be to sort and then limit, taking time Θ(n * log(n)). But these operations together can be performed in O(n * log(qty)) and a smart streaming framework could view the entire stream before executing it to optimize this special case.

like image 522
Reinstate Monica Avatar asked Jul 22 '15 22:07

Reinstate Monica


3 Answers

Let me start by making the general point that the Java language specification places few restrictions on how streams are implemented. So it's really not too meaningful to ask about the performance of Java streams: it will vary significantly between implementations.

Also note that Stream is an interface. You can create your own class that implements Stream to have any performance or special behaviour on sorted that you want. So really asking about the performance of Stream makes no sense even within the context of one implementation. The OpenJDK implementation has lots of classes that implement the Stream interface.

Having said that, if we look at the OpenJDK implementation, sorting of streams ends up in SortedOps class (see source here) you will find that the sorting methods end up returning extensions of stateful operations. For example:

private static final class OfInt extends IntPipeline.StatefulOp<Integer>

These methods check if the upstream is already sorted in which case they just pass it to the downstream. They also have special exceptions for sized streams (i.e. upstream) which preallocate the arrays that they end up sorting which will improve efficiency (over a SpinedBuffer that they use for unknown size streams). But whenever the upstream is not already sorted they accept all items, then sort them and then send to the accept method of the downstream instance.

So the conclusion from this is that the OpenJDK sorted implementation collects all items, then sorts, then sends downstream. In some cases this will be wasting resources when the downstream will then discard some elements. You are free to implement your own specialised sort operation that is more efficient than this for special cases. Probably the most straightforward way is to implement a Collector that keeps a list of the n largest or smallest items in the stream. Your operation might then look something like:

.collect(new CollectNthLargest(4)).stream()

To replace

.sorted().limit(4)
like image 66
sprinter Avatar answered Oct 22 '22 04:10

sprinter


There's a special collector in my StreamEx library which performs this operation: MoreCollectors.least(qty):

List<?> result = stream.collect(MoreCollectors.least(qty));

It uses PriorityQueue inside and actually works significantly faster with small qty on unsorted inputs. Note however if input is mostly sorted, then sorted().limit(qty) may work faster as TimSort is incredibly fast for presorted data.

like image 5
Tagir Valeev Avatar answered Oct 22 '22 04:10

Tagir Valeev


That's implementation-dependent and might also depend on whether the stream pipeline can "see through" potential operations between the sorted() and the limit().

Even if you were to ask about the OpenJDK implementation it is subject to change since the javadocs make no guarantees about the runtime behavior. But no, currently it does not implement a k-min selection algorithm.

You also have to keep in mind that sorted() doesn't work on infinite streams unless they already have the SORTED characteristic.

like image 3
the8472 Avatar answered Oct 22 '22 04:10

the8472