Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8: First use of stream() or parallelStream() very slow - Usage in practice meaningful?

In the last few days I made some test with external iteration, streams and parallelStreams in Java 8 and measured the duration of the execution time. I also read about the warm up time which I have to consider. But one question still remains.

The first time when I call the method stream() or parallelStream() on a collection the execution time is higher than it is for an external iteration. I already know, that when I call the stream() or parallelStream() more often on the same collection and avarage the execution time, then the parallelStream() is indeed faster than the external iteration. But since in practice a collection is also often only iterate once, I only see an disadvantage in using streams or parallelstreams.

So my question is:

If I only iterate an collection once, is it a good idea to use stream or parallelStream() or will the execution time always be higher than for external iteration?

like image 404
Veilchen4ever Avatar asked Sep 02 '14 13:09

Veilchen4ever


People also ask

What is the difference between stream () and parallelStream ()?

stream() works in sequence on a single thread with the println() operation. list. parallelStream(), on the other hand, is processed in parallel, taking full advantage of the underlying multicore environment. The interesting aspect is in the output of the preceding program.

Which is faster stream or parallel stream?

Methods at lines (3) and (4). The performance of both streams degrades fast when the number of values increases. However, the parallel stream performs worse than the sequential stream in all cases.

Does Java stream improve performance?

In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application.

Does parallel stream improve performance?

Good Java developers understand sprinkling parallel doesn't improve the execution. Sometimes serial execution can be faster than parallel. Even so, most neglect this fact. And don't take into consideration what's needed to get better results.


1 Answers

Entirely coincidentally (apparently), Doug Lea, Brian Goetz, and several other folks have written a document called Stream Parallel Guidance. (This is only a draft.) It does have some useful discussion about when to use parallel vs. sequential streams.

A brief summary: a parallel stream is more expensive to start up than a sequential stream. If your workload is splittable, and you have multiple CPU cores that can be brought to bear on the problem, and if the per-element cost isn't unreasonably small, you'll get a parallel speedup with a sufficiently large workload. (How's that for a lot of conditionals?) Oh, and you also have to be careful about benchmarking.

StackOverflow is littered with questions that attempt to add up a few integers in parallel and then claim that parallel streams are no good because they don't provide any speedup. I won't even bother linking to them.

Now, you had asked about "external iteration" (basically a for-loop) vs streams, parallel or sequential. I think it's important consider parallel vs sequential streams, as I've done above. This will help inform further decisions. Clearly, if there is a possibility you'll need to run things in parallel, then you should probably go with streams, even if you initially start sequentially.

Even if you don't intend to go parallel, there are still a number of considerations between for-loops and sequential streams. There is a certain amount of overhead of streams compared to conventional loops -- especially for-loops over an array. But this is usually amortized over the workload. Even if the collection is iterated only once, amortization of the setup can occur if the number of elements in the collection is sufficiently large. For example, if the collection has 10 elements, the extra setup cost of a stream probably isn't worth it. If the collection has 10,000 elements, it might be a different story.

For-loops over arrays are particularly fast because the only "setup" is initializing loop counters and limit values in registers. JIT compilers can bring many loop optimizations to bear as well. It's rare for sequential streams to beat a for-loop over an array, though it can happen.

For-loops over collections usually involve creating an iterator and thus have somewhat more overhead than array-based loops. In particular, each iteration on an iterator involves method calls to hasNext and next whereas a stream can get each element with a single method call. For this reason there are times a sequential stream can beat a iterator-based loop (given the right per-element workload, a sufficiently large number of elements, etc.). So even though there is some setup cost for a stream, there is also the possibility that it might end up running faster than a conventional for-loop.

Finally, performance isn't the only consideration. There is also readability and maintainability. The streams and lambda stuff may initially be new and unfamiliar, but it has great potential to simplify and clean up code. See my answer to another question, for example.

like image 194
Stuart Marks Avatar answered Oct 28 '22 22:10

Stuart Marks