With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:
myShapesCollection.stream() .filter(e -> e.getColor() == Color.RED) .forEach(e -> System.out.println(e.getName())); myShapesCollection.parallelStream() // <-- This one uses parallel .filter(e -> e.getColor() == Color.RED) .forEach(e -> System.out.println(e.getName()));
As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.
Are there other considerations? When should parallel stream be used and when should the non-parallel be used?
(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)
Parallel StreamIt is a very useful feature of Java to use parallel processing, even if the whole program may not be parallelized. Parallel stream leverage multi-core processors, which increases its performance.
The Stream API makes it possible to execute a sequential stream in parallel without rewriting the code. The primary reason for using parallel streams is to improve performance while at the same time ensuring that the results obtained are the same, or at least compatible, regardless of the mode of execution.
1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.
A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if
I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
I have a performance problem in the first place
I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)
In your example, the performance will anyway be driven by the synchronized access to System.out.println()
, and making this process parallel will have no effect, or even a negative one.
Moreover, remember that parallel streams don't magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.
In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With