Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is flatMap guaranteed to be lazy? [duplicate]

Consider the following code:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

Will fetchDataFromInternet be called for second url when the first one was enough?

I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?

    Stream.of("one", "two", "three")
            .flatMap(num -> {
                System.out.println("Processing " + num);
                // return FetchFromInternetForNum(num).data().stream();
                return Stream.of(num);
            })
            .peek(num -> System.out.println("Peek before filter: "+ num))
            .filter(num -> num.length() > 0)
            .peek(num -> System.out.println("Peek after filter: "+ num))
            .forEach(num -> {
                System.out.println("Done " + num);
            });

Output:

Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three

Update: Using official Oracle JDK8 if that matters on implementation

Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.

If this behavior is intended, the API should let the function return an Iterable instead of a stream.

In other words: link

like image 304
balki Avatar asked Sep 18 '17 22:09

balki


People also ask

Is flatMap lazy Java?

Answer: Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.

What is difference between flat and flatMap?

The difference is that the map operation produces one output value for each input value, whereas the flatMap operation produces an arbitrary number (zero or more) values for each input value.

Does flatMap preserve order?

Does flatmap() method preserve the order of the streams? Yes, It does and map() also.

What does the flatMap method do?

In short, we can say that the flatMap() method helps in converting Stream<Stream<T>> to Stream<T>. It performs flattening (flat or flatten) and mapping (map), simultaneously. The Stream. flatMap() method combines both the operations i.e. flat and map.


3 Answers

Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :

 int result = Stream.of(1)
            .flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
            .findFirst()
            .get();

    System.out.println(result);

This never finishes as flatMap is computed eagerly. For your example:

urls.stream()
    .flatMap(url -> fetchDataFromInternet(url).stream())
    .filter(...)
    .findFirst()
    .get();

It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.

EDIT

This is fixed in Java 10, where we get our laziness back: see JDK-8075939

EDIT 2

This is fixed in Java 8 too (8u222): JDK-8225328

like image 51
Eugene Avatar answered Oct 23 '22 12:10

Eugene


It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:

Stream.of("hello", "world")
      .flatMap(s -> {
          System.out.println("flatMap function evaluated for \""+s+'"');
          return s.chars().boxed();
      })
      .peek(c -> System.out.printf("processing element %c%n", c))
      .filter(c -> c>'h')
      .findFirst()
      .ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l

This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.

So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.

like image 30
Holger Avatar answered Oct 23 '22 13:10

Holger


Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.

 stream(spliterator).map(o -> o).flatMap(Stream::of)..flatMap(Stream::of).findAny()

For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.

StreamHelper is collection based, but it is easy to adapt Spliterator.

https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java

like image 45
Daniil Iaitskov Avatar answered Oct 23 '22 12:10

Daniil Iaitskov