Java 8 streams allow code that is a lot more readable than old-fashioned for
loops, in most cases. However, based on my own experience and what I've read, using a stream instead of a for loop can involve a performance hit (or occasionally an improvement) which is sometimes difficult to predict.
In a large project it doesn't seem feasible to write a benchmark test for every loop, so when deciding whether to replace a for
loop with a stream, what are the key factors (e.g. expected size of the collection, expected percentage of values removed by filtering, complexity of iterative operations, the type of reduction or aggregation, etc.) which give a likely indication of the performance change that will result?
Note: this is a narrowing of my earlier question, which was closed for being too broad (and for which the aspects of parallel streams were pretty well covered in another SO question), so let's just limit this to sequential streams.
Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.
Remember that loops use an imperative style and Streams a declarative style, so Streams are likely to be much easier to maintain. If you have a small list, loops perform better. If you have a huge list, a parallel stream will perform better.
Streams are lazy because intermediate operations are not evaluated until terminal operation is invoked. Each intermediate operation creates a new stream, stores the provided operation/function and return the new stream. The pipeline accumulates these newly created streams.
In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application.
It’s not only “not feasible to write a benchmark test for every loop”, it’s counter productive. A particular, application specific loop may perform entirely different when being put into a micro-benchmark.
For an actual application, the standard rule of optimization applies: don’t do it. Just write whatever is more readable and only if there is a performance problem, profile the entire application to check whether a particular loop or stream use really is the bottleneck. Only if this is the case, you may try to switch between both idioms at the particular bottleneck to see whether it makes a difference.
In most cases, it won’t. If there is a real performance issue, it will stem from the type of operation, e.g. performing a nested iteration with an O(n²)
time complexity, etc. Such problems do not dependent on whether you use a Stream
or a for
loop and the minor performance differences between these two idioms don’t change how your code scales.
There aren't big general speed differences between streams and loops; their advantages/disadvantages are problem-specific. Whether you choose one or the other should depend (mostly) on the readability the code. For some performance comparisons, see Benchmark1 and Benchmark2 where you can notice Brian Goetz's comment to one of the answers:
Your conclusion about performance, while valid, is overblown. There are plenty of cases where the stream code is faster than the iterative code, largely because per-element access costs is cheaper with streams than with plain iterators. And in many cases, the streams version inlines to something that is equivalent to the hand-written version. Of course, the devil is in the details; any given bit of code might behave differently.
Apart from that, just make sure that when you benchmark you use the JMH.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With