Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the order in which stream operations are applied to list elements? [duplicate]

Suppose we have a standard method chain of stream operations:

Arrays.asList("a", "bc", "def").stream()
  .filter(e -> e.length() != 2)
  .map(e -> e.length())
  .forEach(e -> System.out.println(e));

Are there any guarantees in the JLS regarding the order in which stream operations are applied to the list elements?

For example, is it guaranteed that:

  1. Applying the filter predicate to "bc" is not going to happen before applying the filter predicate to "a"?
  2. Applying the mapping function to "def" is not going to happen before applying the mapping function to "a"?
  3. 1 will be printed before 3?

Note: I am talking here specifically about stream(), not parallelStream() where it is expected that operations like mapping and filtering are done in parallel.

like image 414
Dragan Bozanovic Avatar asked Mar 19 '16 14:03

Dragan Bozanovic


People also ask

How does stream detect duplicate values in a List?

Get the stream of elements in which the duplicates are to be found. For each element in the stream, count the frequency of each element, using Collections. frequency() method. Then for each element in the collection list, if the frequency of any element is more than one, then this element is a duplicate element.

Is List stream ordered?

Stream Ordering: Certain stream sources (such as List or arrays) are intrinsically ordered, whereas others (such as HashSet) are not. Some intermediate operations, such as sorted(), may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.

How do I find duplicates in a List of strings in Java 8?

In Java 8 Stream, filter with Set. Add() is the fastest algorithm to find duplicate elements, because it loops only one time. Set<T> items = new HashSet<>(); return list.

What are the operations in stream?

There are two types of operations in streams, some operations produce another stream as a result and some operations produce non-stream values as a result. So we can say that stream interface has a selection of terminal and non-terminal operations.


3 Answers

Everything you want to know can be found within the java.util.stream JavaDoc.

Ordering

Streams may or may not have a defined encounter order. Whether or not a stream has an encounter order depends on the source and the intermediate operations. Certain stream sources (such as List or arrays) are intrinsically ordered, whereas others (such as HashSet) are not. Some intermediate operations, such as sorted(), may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.unordered(). Further, some terminal operations may ignore encounter order, such as forEach().

If a stream is ordered, most operations are constrained to operate on the elements in their encounter order; if the source of a stream is a List containing [1, 2, 3], then the result of executing map(x -> x*2) must be [2, 4, 6]. However, if the source has no defined encounter order, then any permutation of the values [2, 4, 6] would be a valid result.

For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results.

For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Certain aggregate operations, such as filtering duplicates (distinct()) or grouped reductions (Collectors.groupingBy()) can be implemented more efficiently if ordering of elements is not relevant. Similarly, operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.

like image 89
Yassin Hajaj Avatar answered Oct 06 '22 01:10

Yassin Hajaj


Are there any guarantees in the JLS regarding the order in which stream operations are applied to the list elements?

The Streams library is not covered by the JLS. You would need to read the Javadoc for the library.

Streams also support parallel stream and the order in which things are processed depends on the implementations.

Applying the filter predicate to "bc" is not going to happen before applying the filter predicate to "a"?

It would be reasonable to assume that it would, but you can't guarantee it, nor should you be writing code which requires this guarantee otherwise you wouldn't be able to parallelise it later.

applying the mapping function to "def" is not going to happen before applying the mapping function to "a"?

It is safe assume this does happen, but you shouldn't write code which requires it.

like image 24
Peter Lawrey Avatar answered Oct 06 '22 02:10

Peter Lawrey


There is no guarantee of the order in which list items are passed to predicate lambdas. Stream documentation makes guarantees regarding the output of streams, including the order of encounter; it does not make guarantees about implementation details, such as the order in which filter predicates are applied.

Therefore, the documentation does not prevent filter from, say, reading several elements, running the predicate on them in reverse order, and then sending the elements passing the predicate to the output of the stream in the order in which they came in. I don't know why filter() would do something like that, but doing so wouldn't break any guarantee made in the documentation.

You can make pretty strong inference from the documentation that filter() would call predicate on the elements in the order in which collection supplies them, because you are passing the result of calling stream() on a list, which calls Collection.stream(), and, according to Java documentation, guarantees that Stream<T> produced in this way is sequential:

Returns a sequential Stream with this collection as its source.

Further, filter() is stateless:

Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element - each element can be processed independently of operations on other elements.

Therefore it is rather likely that filter would call the predicate on elements in the order they are supplied by the collection.

I am talking here specifically about stream(), not parallelStream()

Note that Stream<T> may be unordered without being parallel. For example, calling unordered() on a stream(), the result becomes unordered, but not parallel.

like image 34
Sergey Kalinichenko Avatar answered Oct 06 '22 02:10

Sergey Kalinichenko