Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream.reduce always preserving order on parallel, unordered stream

I've gone through several previous questions like Encounter order preservation in java stream, this answer by Brian Goetz, as well as the javadoc for Stream.reduce(), and the java.util.stream package javadoc, and yet I still can't grasp the following:

Take this piece of code:

  public static void main(String... args) {
    final String[] alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("");
    System.out.println("Alphabet: ".concat(Arrays.toString(alphabet)));
    System.out.println(new HashSet<>(Arrays.asList(alphabet))
          .parallelStream()
          .unordered()
          .peek(System.out::println)
          .reduce("", (a,b) -> a + b, (a,b) -> a + b));
  }

Why is the reduction always* preserving the encounter order?

  • So far, after several dozen runs, output is the same
like image 404
George Aristy Avatar asked Aug 04 '17 21:08

George Aristy


3 Answers

First of all unordered does not imply an actual shuffling; all it does it sets a flag for the Stream pipeline - that could later be leveraged.

A shuffle of the source elements could potentially be much more expensive then the operations on the stream pipeline themselves, so the implementation might choose not to do this(like in this case).

At the moment (tested and looked at the sources) of jdk-8 and jdk-9 - reduce does not take that into account. Notice that this could very well change in a future build or release.

Also when you say unordered - you actually mean that you don't care about that order and the stream returning the same result is not a violation of that rule.

For example notice this question/answer that explains that findFirst for example (just another terminal operation) changed to take unordered into consideration in java-9 as opposed to java-8.

like image 156
Eugene Avatar answered Oct 10 '22 03:10

Eugene


To help explain this, I am going to reduce the scope of this string to ABCD.

The parallel stream will divide the string into two pieces: AB and CD. When we go to combine these later, the result of the AB side will be the first argument passed into the function, while the result of the CD side will be the second argument passed into the function. This is regardless of which of the two actually finishes first.

The unordered operator will affect some operations on a stream, such as a limit operation, it does not affect a simple reduce.

like image 45
Joe C Avatar answered Oct 10 '22 04:10

Joe C


TLDR: .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

Spliterator

The encounter order of the stream depends on stream spliterator (None of the answers mentioned that before).

There are different spliterators based on the source stream. You can get the types of spliterators from the source code of those collections.

HashSet -> HashMap#KeySpliterator = Not ordered

ArrayDeque = Ordered

ArrayList = Ordered

TreeSet -> TreeMap#Spliterator = Ordered and sorted

logicbig.com - Ordering logicbig.com - Stateful vs Stateless

Additionally you can apply .unordered() intermediate stream operation that specifies following operations in the stream should not rely on ordering.

Stream operations (mostly stateful) that are affected by spliterator and usage of .unordered() method are:

  • .findFirst()
  • .limit()
  • .skip()
  • .distinct()

Those operations will give us different results based on the order property of the stream and its spliterator.

.peek() method does not take ordering into consideration, if stream is executed in parallel it will always print/receive elements in unordered manner.

.reduce()

Now for the terminal .reduce() method. Intermediate operation .unordered() doesn't have any affect on type of spliterator (as @Eugene mentioned). But important notice, it still stays the same as it is in the source spliterator. If source spliterator is ordered, result of the .reduce() will be ordered, if source was unordered result of .reduce() will be unordered.

You are using new HashSet<>(Arrays.asList(alphabet)) to get the instance of the stream. Its spliterator is unordered. It was just a coincidence that you are getting your result ordered because you are using the single alphabet Strings as elements of the stream and unordered result is actually the same. Now if you would mix that with numbers or mix it with lower case and upper case then this doesn't hold true anymore. For example take following inputs, the first one is subset of the example you posted:

HashSet .reduce() - Unordered

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "a1Ab2Bc3C"
"Apple","Orange","Banana","Mango" -> "AppleMangoOrangeBanana"

TreeSet .reduce() - Ordered, Sorted

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "123ABCabc"
"Apple","Orange","Banana","Mango" -> "AppleBananaMangoOrange"

ArrayList .reduce() - Ordered

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "abc123ABC"
"Apple","Orange","Banana","Mango" -> "AppleOrangeBananaMango"

You see that testing .reduce() operation only with an alphabet source stream can lead to false conclusions.

The answer is .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

like image 25
RenatoIvancic Avatar answered Oct 10 '22 04:10

RenatoIvancic