Running the following stream example in Java8:
System.out.println(Stream .of("a", "b", "c", "d", "e", "f") .reduce("", (s1, s2) -> s1 + "/" + s2) );
yields:
/a/b/c/d/e/f
Which is - of course - no surprise. Due to http://docs.oracle.com/javase/8/docs/api/index.html?overview-summary.html it shouldn't matter whether the stream is executed sequentially or parallel:
Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.
AFAIK reduce()
is deterministic and (s1, s2) -> s1 + "/" + s2
is associative, so that adding parallel()
should yield the same result:
System.out.println(Stream .of("a", "b", "c", "d", "e", "f") .parallel() .reduce("", (s1, s2) -> s1 + "/" + s2) );
However the result on my machine is:
/a//b//c//d//e//f
What's wrong here?
BTW: using (the preferred) .collect(Collectors.joining("/"))
instead of reduce(...)
yields the same result a/b/c/d/e/f
for sequential and parallel execution.
JVM details:
java.specification.version: 1.8 java.version: 1.8.0_31 java.vm.version: 25.31-b07 java.runtime.version: 1.8.0_31-b13
Parallel streams divide the provided task into many and run them in different threads, utilizing multiple cores of the computer. On the other hand sequential streams work just like for-loop using a single core.
1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.
A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed. The code used in this POC can be found on GitHub.
From reduce's documentation:
The identity value must be an identity for the accumulator function. This means that for all t, accumulator.apply(identity, t) is equal to t.
Which is not true in your case - "" and "a" creates "/a".
I have extracted the accumulator function and added a printout to show what happens:
BinaryOperator<String> accumulator = (s1, s2) -> { System.out.println("joining \"" + s1 + "\" and \"" + s2 + "\""); return s1 + "/" + s2; }; System.out.println(Stream .of("a", "b", "c", "d", "e", "f") .parallel() .reduce("", accumulator) );
This is example output (it differs between runs):
joining "" and "d" joining "" and "f" joining "" and "b" joining "" and "a" joining "" and "c" joining "" and "e" joining "/b" and "/c" joining "/e" and "/f" joining "/a" and "/b//c" joining "/d" and "/e//f" joining "/a//b//c" and "/d//e//f" /a//b//c//d//e//f
You can add an if statement to your function to handle empty string separately:
System.out.println(Stream .of("a", "b", "c", "d", "e", "f") .parallel() .reduce((s1, s2) -> s1.isEmpty()? s2 : s1 + "/" + s2) );
As Marko Topolnik noticed, checking s2
is not required as accumulator doesn't have to be commutative function.
To add to other answer,
You might want to use Mutable reduction, the doc specify that doing something like
String concatenated = strings.reduce("", String::concat)
Will give bad performance result.
We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.
So you should use a StringBuilder instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With