I was playing around with infinite streams and made this program for benchmarking. Basically the bigger the number you provide, the faster it will finish. However, I was amazed to find that using a parellel stream resulted in exponentially worse performance compared to a sequential stream. Intuitively, one would expect an infinite stream of random numbers to be generated and evaluated much faster in a multi-threaded environment, but this appears not to be the case. Why is this?
final int target = Integer.parseInt(args[0]);
if (target <= 0) {
System.err.println("Target must be between 1 and 2147483647");
return;
}
final long startTime, endTime;
startTime = System.currentTimeMillis();
System.out.println(
IntStream.generate(() -> new Double(Math.random()*2147483647).intValue())
//.parallel()
.filter(i -> i <= target)
.findFirst()
.getAsInt()
);
endTime = System.currentTimeMillis();
System.out.println("Execution time: "+(endTime-startTime)+" ms");
The performance of both streams degrades fast when the number of values increases. However, the parallel stream performs worse than the sequential stream in all cases.
parallelStream() works parallelly on multiple threads. If we run this code multiple times then we can also see that each time we are getting a different order as output but this parallel stream boosts the performance so the situation where the order is not important is the best technique to use.
A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed. The code used in this POC can be found on GitHub.
1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.
I totally agree with the other comments and answers but indeed your test behaves strange in case that the target is very low. On my modest laptop the parallel version is on average about 60x slower when very low targets are given. This extreme difference cannot be explained by the overhead of the parallelization in the stream APIs so I was also amazed :-). IMO the culprit lies here:
Math.random()
Internally this call relies on a global instance of java.util.Random
. In the documentation of Random it is written:
Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.
So I think that the really poor performance of the parallel execution compared to the sequential one is explained by the thread contention in random rather than any other overheads. If you use ThreadLocalRandom
instead (as recommended in the documentation) then the performance difference will not be so dramatic. Another option would be to implement a more advanced number supplier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With