Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel stream vs serial stream

Is it possible that a parallel stream could give a different result than a serial stream in Java 8? According to my information, a parallel stream is the same as a serial stream except divided into multiple substreams. It is a question of speed. All operations over the elements are done and the results of the substreams are combined at the end. In the end, the result of the operations should be the same for parallel and serial streams in my opinion. So my question is, is it possible that this code could give me a different result? And if it is possible, why does it happen?

int[] i = {1, 2, 5, 10, 9, 7, 25, 24, 26, 34, 21, 23, 23, 25, 27, 852, 654, 25, 58};
Double serial = Arrays.stream(i).filter(si -> {
    return si > 5;
}).mapToDouble(Double::new).map(NewClass::add).reduce(Math::atan2).getAsDouble();

Double parallel = Arrays.stream(i).filter(si -> {
    return si > 5;
}).parallel().mapToDouble(Double::new).map(NewClass::add).reduce(Math::atan2).getAsDouble();

System.out.println("serial: " + serial);
System.out.println("parallel: " + parallel);

public static double add(double i) {
    return i + 0.005;
}

and results are:

serial: 3.6971567726175894E-23

parallel: 0.779264049587662
like image 825
Ján Яabčan Avatar asked Sep 26 '15 17:09

Ján Яabčan


People also ask

What is the difference between stream and parallel streams?

A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.

What is parallel and serial stream in Java?

Parallel Streams in Java. Any pipelined stream operations are performed sequentially (in serial) by default. To execute operations in parallel, we need to specify explicitly by adding parallel method to any existing sequential stream or by creating a stream by calling the parallelStream method in the collection object.

What are parallel streams?

Normally any java code has one stream of processing, where it is executed sequentially. Whereas by using parallel streams, we can divide the code into multiple streams that are executed in parallel on separate cores and the final result is the combination of the individual outcomes.

Why parallel streams are not used?

Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g. This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.


1 Answers

The javadoc for reduce() says:

Performs a reduction on the elements of this stream, using an associative accumulation function, [...] The accumulator function must be an associative function.

The word "associative" is linked to this java doc:

An operator or function op is associative if the following holds:

 (a op b) op c == a op (b op c)

The importance of this to parallel evaluation can be seen if we expand this to four terms:

 a op b op c op d == (a op b) op (c op d)

So we can evaluate (a op b) in parallel with (c op d), and then invoke op on the results.

Examples of associative operations include numeric addition, min, and max, and string concatenation.

As @PaulBoddington mentioned in a comment, atan2 is not associative, and is therefore not valid for a reduction operation.


Unrelated

Your stream sequence is a bit off. You should filter after the parallel operation, the lambda can be shortened, and you shouldn't box the double:

double parallel = Arrays.stream(i)
                        .parallel()           // <-- before filter
                        .filter(si -> si > 5) // <-- shorter
                        .asDoubleStream()     // <-- not boxing
                        .reduce(Math::atan2)
                        .getAsDouble();
like image 172
Andreas Avatar answered Oct 03 '22 00:10

Andreas