I'm trying to form a cocise and conherent understanding of the application of lazy evaluation within the Java streams API.
Here is what I currently understand:
What I want to do is bring all these ideas together and ensure I'm not misrepresenting anything. I'm finding it tricky because whenever I read any literature on Java streams, it goes on to say they're lazy or utilise lazy evaluation, and then very much interchangeably starts talking about optimisations such as fusion and short-circuiting.
So would I be right in saying the following?
fusion is how lazy evaluation has been implemented in the stream API - i.e. an element is consumed, and operations are fused together wherever possible. I'm thinking that if fusion didn't exist then surely we'd be back to eager evaluation as the alternative would just be to process all elements for each intermediate operation before moving onto the next?
short-circuiting is possible without fusion or lazy evaluation but is very much helped in the context of streams by these the implementation of these two principles?
I'd appreciate any further insight and clarity on this.
There are two types of operations in streams, some operations produce another stream as a result and some operations produce non-stream values as a result. So we can say that stream interface has a selection of terminal and non-terminal operations.
A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
If you have a small list, loops perform better. If you have a huge list, a parallel stream will perform better. Purely thinking in terms of performance, you shouldn't use a for-each loop with an ArrayList, as it creates an extra Iterator instance that you don't need (for LinkedList it's a different matter).
Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.
As for fusion. Let's imagine here's a map
operation:
.map(x -> x.squash())
It's stateless and it just transforms any input according to the specified algorithm (in our case squashes them). Now the filter operation:
.filter(x -> x.getColor() != YELLOW)
It's also stateless and it just removes some elements (in our case yellow ones). Now let's have a terminal operation:
.forEach(System.out::println)
It just displays the input elements to the terminal. The fusion means that all intermediate stateless operations are merged with terminal consumer into single operation:
.map(x -> x.squash())
.filter(x -> x.getColor() != YELLOW)
.forEach(System.out::println)
The whole pipeline is fused into single Consumer
which is connected directly to the source. When every single element is processed, the source spliterator just executes the combined consumer, the stream pipeline does not intercept anything and does not perform any additional bookkeeping. That's fusion. Fusion does not depend on short-circuiting. It's possible to implement streams without fusion (execute one operation, take the result, execute the next operation, taking the control after each operation back to the stream engine). It's also possible to have fusion without short-circuiting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With