Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream intermediate operations ordering

Is there a guarantee that, when working with a stream, intermediate operations will be executed in program order? I suspect it is the case or it would lead to very subtle bugs but I could not find a definite answer.

Example:

List<String> list = Arrays.asList("a", "b", "c");
List<String> modified = list.parallelStream()
        .map(s -> s + "-" + s)                 //"a-a", "b-b", "c-c"
        .filter(s -> !s.equals("b-b"))         //"a-a", "c-c"
        .map(s -> s.substring(2))              //"a", "c"
        .collect(toList());

Is this guaranteed to always return ["a", "c"] or ["c", "a"]? (if the last map operation is executed before the first map operation, that could throw an exception - similarly if the filter is executed after the second map operation, "b" will be retained in the final list)

like image 738
assylias Avatar asked Feb 26 '14 16:02

assylias


2 Answers

There are actually several questions about ordering embedded within the original question.

Holger's answer covers the ordering of stream operations within a pipeline. For a particular stream element the pipeline operations must be performed as written in the program, because in general, the types must match up, and also, because it kind of doesn't make sense to do it any other way. Starting with the original example, the streams library cannot reorder the operations as if they had been written,

List<String> modified = list.parallelStream()
    .filter(s -> !s.equals("b-b")) // these two operations are swapped
    .map(s -> s + "-" + s)         // compared to the original example
    .map(s -> s.substring(2))
    .collect(toList());

because then the result would be [a, b, c]. This won't happen.

The original question asked about whether the answer could be [c, a] instead [a, c]. This is actually a question about a different kind of ordering, which we refer to as encounter order. This concept is mentioned in the java.util.stream package documentation. Unfortunately it isn't crisply defined anywhere that I'm aware of. Briefly, it relates to the relative positioning of elements within the stream (as opposed to execution order) and whether this positioning has any semantics.

For example, consider streams sourced from a HashSet and from an ArrayList. A stream based on a HashSet has no defined encounter order, or put another way, it is unordered. If you put a bunch of elements into a HashSet and then iterate them out, they'll come out in some order that's probably unrelated to the order you put them in.

A stream based on a List, however, does have a defined encounter order. In the original example, the list is [a, b, c], and clearly "a" comes before "b" which comes before "c". This positioning is generally preserved by stream operations from the source through to the output.

Let me modify the original example to show the significance of encounter order. All I've done is to change the order of the strings in the original list:

List<String> list = Arrays.asList("c", "b", "a");
List<String> modified = list.parallelStream()
    .map(s -> s + "-" + s)                 //"c-c", "b-b", "a-a"
    .filter(s -> !s.equals("b-b"))         //"c-c", "a-a"
    .map(s -> s.substring(2))              //"c", "a"
    .collect(toList());

As we expect, the output is [c, a]. Now let's run the stream over a set instead of a list:

List<String> list = Arrays.asList("c", "b", "a");
Set<String> set = new HashSet<>(list);
List<String> modified = set.parallelStream()
    .map(s -> s + "-" + s)
    .filter(s -> !s.equals("b-b"))
    .map(s -> s.substring(2))
    .collect(toList());

This time, the result is [a, c]. The pipeline operations (map, filter, map) haven't changed order, but since the encounter order of elements in the set is undefined, the results end up in the destination list in some order that happens to differ from the previous result.

(I had to change the order of values in the original list, because it happens that the order of iteration of a HashSet is related to the hashcodes of the elements, and the simple string examples given here have consecutive hashcodes.)

There is yet another "ordering" one might consider, which is the relative execution order of pipeline operations among different elements. For parallel streams, this is completely non-deterministic. A way to observe this is to mutate an object from within a pipeline operation. (To do this safely, the object being mutated must of course be thread-safe, and it is unwise to rely on the ordering of any such side effects.) Here's an example:

List<Integer> list1 = Collections.synchronizedList(new ArrayList<>());
List<Integer> list2 =
    IntStream.range(0, 10)
        .parallel()
        .boxed()
        .peek(i -> list1.add(i))
        .collect(toList());
System.out.println(list1);
System.out.println(list2);

On my system, the output is:

[5, 6, 2, 3, 4, 8, 9, 7, 0, 1]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The encounter order of the source is preserved to the output in list2, but the ordering of list1 is different in general. In fact, the ordering of the elements in list1 varies from run to run, whereas the order of elements in list2 is always the same.

In summary, there are three different kinds of ordering shown here:

  • the ordering of pipeline operations on some particular element;
  • the encounter order of the stream, and
  • the execution order of pipeline operations on different elements.

They're all distinct.

like image 64
Stuart Marks Avatar answered Oct 02 '22 16:10

Stuart Marks


Your question arose because you are mapping from one type to the same type. If you think about the formal operations you are performing, it becomes clear that there is no way to change the order of the specified operations:

  • you map items of a Stream<A> to an arbitrary type B creating a Stream<B>
  • you apply a Filter<B> on the result of the first mapping
  • you map the filtered Stream<B> to an arbitrary type C creating a Stream<C>
  • you collect the items of type C into a List<C>

Looking at these formal steps it should be clear that there is no way to change the order of these steps due to the type compatibility requirements.

The fact that in your special case all three types happen to be String does not change the logic of how the Streams work. Keep in mind that the actual types you are using for the type parameters are erased and do not exist at runtime.

The Stream implementation may coerce operations where it is useful, e.g. performing a sorted and distinct in one go but this requires that both operations are requested on the same items and Comparator. Or simply said, internal optimizations must not change the semantics of the requested operations.

like image 28
Holger Avatar answered Oct 02 '22 16:10

Holger