Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream.sorted() then collect, or collect then List.sort()? [duplicate]

In general, is there a performance difference between these two pieces of code?

List<Integer> list1 = someStream1.sorted().collect(toList());
// vs.
List<Integer> list2 = someStream2.collect(toList());
list2.sort(Comparator.naturalOrder())

Variant 2 is obviously yucky and should be avoided, but I'm curious if there are any performance optimizations built into the mainstream (heh, mainstream) implementations of Stream that would result in a performance difference between these two.

I imagine that because the stream has strictly more information about the situation, it would have a better opportunity to optimize. E.g. I imagine if this had a findFirst() call tacked on, it would elide the sort, in favor of a min operation.

like image 794
Alexander Avatar asked Sep 21 '18 18:09

Alexander


3 Answers

Both options should result in the same final result. But runtime characteristics could be different. What if the initial stream is a parallel one? Then option 1 would do a sort in parallel, whereas option 2 wouldn't do a "sequential" sort. The result should be the same, but the overall runtime resp. CPU load could be very much different then.

I would definitely prefer option 1 over 2: why create a list first, to then later sort it?!

Imagine for example you later want to collect into an immutable list. Then all code that follows your second pattern would break. Whereas code written using pattern 1 wouldn't be affected at all!

Of course, in the example here that shouldn't lead to issues, but what if the sort() happens in a slightly different place?!

like image 104
GhostCat Avatar answered Sep 19 '22 23:09

GhostCat


Conceptually, streams are usually looked at as "transient" data that's being processed/manipulated, and collecting the stream conveys the notion you're done manipulating it.

While the second snippet should work, the first one would be the more idiomatic way of doing things.

like image 37
Mureinik Avatar answered Sep 22 '22 23:09

Mureinik


In the first case the sorting happens in the call to collect. If the stream is already sorted this will be a no-op (the data will just pass through as-is). Might not make a big difference, but calling Collections.sort on an already sorted collection is still O(n).

Also the first case benefits from parallel execution, as at least OpenJDK uses Arrays.parallelSort.

Apart from that the first line is cleaner, better to understand and less error prone when refactoring.

like image 24
Max Vollmer Avatar answered Sep 22 '22 23:09

Max Vollmer