Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java8 streams sequential and parallel execution produce different results?

Running the following stream example in Java8:

    System.out.println(Stream         .of("a", "b", "c", "d", "e", "f")         .reduce("", (s1, s2) -> s1 + "/" + s2)     ); 

yields:

/a/b/c/d/e/f 

Which is - of course - no surprise. Due to http://docs.oracle.com/javase/8/docs/api/index.html?overview-summary.html it shouldn't matter whether the stream is executed sequentially or parallel:

Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.

AFAIK reduce() is deterministic and (s1, s2) -> s1 + "/" + s2 is associative, so that adding parallel() should yield the same result:

    System.out.println(Stream             .of("a", "b", "c", "d", "e", "f")             .parallel()             .reduce("", (s1, s2) -> s1 + "/" + s2)     ); 

However the result on my machine is:

/a//b//c//d//e//f 

What's wrong here?

BTW: using (the preferred) .collect(Collectors.joining("/")) instead of reduce(...) yields the same result a/b/c/d/e/f for sequential and parallel execution.

JVM details:

java.specification.version: 1.8 java.version: 1.8.0_31 java.vm.version: 25.31-b07 java.runtime.version: 1.8.0_31-b13 
like image 321
Udo Avatar asked Feb 25 '15 16:02

Udo


People also ask

Is java 8 support parallel and sequential streams?

Parallel streams divide the provided task into many and run them in different threads, utilizing multiple cores of the computer. On the other hand sequential streams work just like for-loop using a single core.

What is the disadvantage of parallel stream in java 8?

1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.

What does parallel and sequential stream do to increase performance?

A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed. The code used in this POC can be found on GitHub.


2 Answers

From reduce's documentation:

The identity value must be an identity for the accumulator function. This means that for all t, accumulator.apply(identity, t) is equal to t.

Which is not true in your case - "" and "a" creates "/a".

I have extracted the accumulator function and added a printout to show what happens:

BinaryOperator<String> accumulator = (s1, s2) -> {     System.out.println("joining \"" + s1 + "\" and \"" + s2 + "\"");     return s1 + "/" + s2; }; System.out.println(Stream                 .of("a", "b", "c", "d", "e", "f")                 .parallel()                 .reduce("", accumulator) ); 

This is example output (it differs between runs):

joining "" and "d" joining "" and "f" joining "" and "b" joining "" and "a" joining "" and "c" joining "" and "e" joining "/b" and "/c" joining "/e" and "/f" joining "/a" and "/b//c" joining "/d" and "/e//f" joining "/a//b//c" and "/d//e//f" /a//b//c//d//e//f 

You can add an if statement to your function to handle empty string separately:

System.out.println(Stream         .of("a", "b", "c", "d", "e", "f")         .parallel()         .reduce((s1, s2) -> s1.isEmpty()? s2 : s1 + "/" + s2) ); 

As Marko Topolnik noticed, checking s2 is not required as accumulator doesn't have to be commutative function.

like image 114
Jaroslaw Pawlak Avatar answered Oct 16 '22 01:10

Jaroslaw Pawlak


To add to other answer,

You might want to use Mutable reduction, the doc specify that doing something like

String concatenated = strings.reduce("", String::concat) 

Will give bad performance result.

We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.

So you should use a StringBuilder instead.

like image 26
Jean-François Savard Avatar answered Oct 16 '22 00:10

Jean-François Savard