Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream with sorted() before findFirst() is no longer lazy

I have a list of elements, I need to find the first element that satisfy the condition then exit using Java 8 streams.

I think the following code unfortunately evaluate all available element which doesn't what i need, I need to evaluate items one by one and stop (break) when find the first match:

I am here sorting the elements, then map the element to its url attribute then try to filter if the url is not null or empty then find first match!

Arrays.stream(dataArray)
.sorted(Comparator.comparing(d -> d.getPriority()))
.peek(o -> System.out.println("SORT: " + o))
.map(d -> d.getOriginalURL(shortUrl))
.peek(o -> System.out.println("MAP: " + o))
.filter(u -> u != null && !u.isEmpty())
.peek(o -> System.out.println("FILTER: " + o))
.findFirst().orElse("");

But the output shows that, all items are evaulated even if the first one matches the if condition (filter) operation.

Data[] data = new Data[] { new ParseData(), new InMemoryData() };
System.out.println(">>> " + getOriginalURL(data, ""));

OUTPUT:

SORT: mhewedy.usingspark.data.InMemoryData@7adf9f5f
MAP: InMemory URL
FILTER: InMemory URL
SORT: mhewedy.usingspark.data.ParseData@85ede7b
MAP: Parse.com URL         <<< THIS SHOULD NOT HAPPEN
FILTER: Parse.com URL      <<< AND THIS TOO
>>> InMemory URL

As output shows, the stream doesn't stop when the filter matches with the first element, instead it continue evaluating the second element too!

I want to do like this:

Arrays.sort(dataArray, Comparator.comparing(d -> d.getPriority())); // sort

for (Data data : dataArray) {
    String url = data.getOriginalURL(shortUrl);           // map
    if (url != null && !url.isEmpty()) {                  // filter
        System.out.println("url :" + url);            
        return url;                                   // find first
    }
}
like image 367
Muhammad Hewedy Avatar asked May 02 '14 01:05

Muhammad Hewedy


People also ask

Is stream sorted stable?

Stream sorted() in JavaFor ordered streams, the sort method is stable but for unordered streams, no stability is guaranteed. It is a stateful intermediate operation i.e, it may incorporate state from previously seen elements when processing new elements.

Are Java streams lazy?

Streams are lazy because intermediate operations are not evaluated until terminal operation is invoked. Each intermediate operation creates a new stream, stores the provided operation/function and return the new stream.

How do I use stream findFirst?

Stream findFirst() in Java with examples Stream findFirst() returns an Optional (a container object which may or may not contain a non-null value) describing the first element of this stream, or an empty Optional if the stream is empty. If the stream has no encounter order, then any element may be returned.

What is findFirst method?

The findFirst() method returns the first element of a stream or an empty Optional. If the stream has no encounter order, any element is returned, as it's ambiguous which is the first one anyway. The findAny() method returns any element of the stream - much like findFirst() with no encounter order.


1 Answers

Here's a smaller example that illustrates the issue:

Stream.of("a", "ab", "abc", "abcd")
    // .sorted() // uncomment and what follows becomes eager
    .filter(s -> s.contains("b"))
    .peek(s -> System.out.println("PEEK: " + s))
    .findFirst()
    .orElse("X");

As expected the output is:

PEEK: ab

If the sorted line is uncommented, the output is:

PEEK: ab
PEEK: abc
PEEK: abcd

(The final result of the entire pipeline is "ab" in both cases, as expected.)

It's true that sorted must consume all of its input before producing its first output element. In that sense it's eager. However, it does seem strange that it affects how elements are sent downstream.

Without sorting, the findFirst operation "pulls" elements from upstream until it finds one, and then it stops. With sorting, the sorted() operation eagerly gathers all the elements, sorts them, and since it has them all right there, it "pushes" them down the stream. Of course, findFirst ignores all but the first element. But this means that intervening operations (such as the filter) may do unnecessary work.

The final result is correct, but the behavior is unexpected. This might be considered a bug. I'll investigate and file a bug if appropriate.

like image 189
Stuart Marks Avatar answered Oct 24 '22 20:10

Stuart Marks