Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 streams - each step in the chain evaluated for entire input or do items get passed through?

Say if I have this trivial program

    List<String> input = Arrays.asList("1", "2", "3");
    List<String> result = input.stream()
            .map(x -> x + " " + x)
            .filter(y -> !y.startsWith("1"))
            .collect(Collectors.toList());

Behind the scenes does it work like a) or b)

A

map
  "1" + " " + "1"
  "2" + " " + "2"
  "3" + " " + "3"
filter
  "1 1" does not begin with "1"? = false
  "2 2" does not begin with "1"? = true
  "3 3" does not begin with "1"? = true
collect
  add "2 2" to list
  add "3 3" to list
result = List("2 2", "3 3")

B

map
  "1" + " " + "1"
filter
  "1 1" does not begin with "1"? = false
map
  "2" + " " + "2"
filter
  "2 2" does not begin with "1"? = true
collect
  add "2 2" to list
map
  "3" + " " + "3"
filter
  "3 3" does not begin with "1"? = true
collect
  add "3 3" to list
result = List("2 2", "3 3")
like image 692
djhworld Avatar asked Mar 29 '14 19:03

djhworld


People also ask

How does stream works in Java 8?

Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.

How does Java streams work internally?

So how does it work internally? It's actually pretty simple. Java uses trySplit method to try splitting the collection in chunks that could be processed by different threads. In terms of the execution plan, it works very similarly, with one main difference.

Does Java stream retain order?

If our Stream is ordered, it doesn't matter whether our data is being processed sequentially or in parallel; the implementation will maintain the encounter order of the Stream.

How streams are executed in Java?

Streams are pull-based. Only a terminal operations (like the collect ) will cause items to be consumed. Conceptually this means that collect will ask an item from the limit , limit from the map and map from the filter , and filter from the stream. And this conforms to your first printout.


2 Answers

It works like option B, not necessarilly in that exact order, but more on that it does every operation on one element at a time.

The reasoning behind this is that variables only pass the stream once, so you need to perform all actions when you have that element right now, because once the element has passed, it is gone forever (from the stream's point of view).

Your code is, in a linear setting, very very very roughly equivalent to the following code, this is a very simplified version, but I hope you get the idea:

Collection<String> input = Arrays.asList("1", "2", "3");
Function<String, String> mapper = x -> x + " " + x;
Predicate<String> filter = y -> !y.startsWith("1");
Collector<String, ?, List<String>> toList = Collectors.toList();

List<String> list = ((Supplier<List<String>>)toList.supplier()).get();
for (String in : input) {
    in = mapper.apply(in);
    if (filter.test(in)) {
        ((BiConsumer<List<String>, String>)toList.accumulator()).accept(list, in);
    }
}

What you see here, is:

  • As input a Collection<String>, your input.
  • A Function<String, String> matching your map().
  • A Predciate<String> matching your filter().
  • A Collector<String, ?, List<String>> matching your collect(), this is a collector that operates on elements of type String, uses intermediate storage ? and gives a List<String>.

What it then does is:

  • Obtain a new list, from the supplier (type: Supplier<List<String>>) of the collector.
  • Loop over every element of the input, done internally when operating on a Stream<String>, I am using a Collection<String> here for expliciteness such that we still have a connection to the old Java 7 world.
  • Apply your mapping function.
  • Test your filter predicate.
  • Obtain the accumulator (type: BiConsumer<List<String>, String>) of the toList collector, this is the binary consumer that takes as arguments the List<String> it already has, and the String it wants to add.
  • Feed our list and in to the accumulator.

Please take a very careful note that the real implementations is much much more advanced, as operations can happen in any order and multiple ones can happen, and much more.

like image 176
skiwi Avatar answered Oct 25 '22 23:10

skiwi


One of the benefit of streams is lazy-evaluation of intermediate operations. That means, when the terminal operation, collect() in this case is executed, it asks for an element from previous intermediate operation - filter(), which in turns gets the element from map(), which in turns operates on first element from list.stream(). Same flow is followed for all the elements. So yes, the execution is more like option B.

Also, since the collector returned by Collectors.toList() is ordered, the elements are guaranteed to execute in order. In some cases, the evaluation might go out of order, when UNORDERED chararacteristic is set for a collector.

like image 45
Rohit Jain Avatar answered Oct 26 '22 01:10

Rohit Jain