I wrote code using Java 8 streams and parallel streams for the same functionality with a custom collector to perform an aggregation function. When I see CPU usage using htop
, it shows all CPU cores being used for both 'streams' and 'parallel streams' version. So, it seems when list.stream()
is used, it also uses all CPUs. Here, what is the precise difference between parallelStream()
and stream()
in terms of usage of multi-core.
Java 8 introduces the concept of the parallel stream to do parallel processing. As we have a number of CPU cores nowadays due to cheap hardware costs, parallel processing can be used to perform operation faster. If you notice the output, the main thread is doing all the work in case of the sequential stream.
A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.
Java 8 introduced the Stream API that makes it easy to iterate over collections as streams of data. It's also very easy to create streams that execute in parallel and make use of multiple processor cores.
Sequential streams outperformed parallel streams when the number of elements in the collection was less than 100,000. Parallel streams performed significantly better than sequential streams when the number of elements was more than 100,000.
Consider the following program:
import java.util.ArrayList; import java.util.List; public class Foo { public static void main(String... args) { List<Integer> list = new ArrayList<>(); for (int i = 0; i < 1000; i++) { list.add(i); } list.stream().forEach(System.out::println); } }
You will notice that this program will output the numbers from 0 to 999 sequentially, in the order in which they are in the list. If we change stream()
to parallelStream()
this is not the case anymore (at least on my computer): all number are written, but in a different order. So, apparently, parallelStream()
indeed uses multiple threads.
The htop
is explained by the fact that even single-threaded applications are divided over mutliple cores by most modern operating systems (parts of the same thread may run on several cores, but of course not at the same time). So if you see that a process used more than one core, this does not mean necessarily that the program uses multiple threads.
Also the performance may not improve when using multiple threads. The cost of synchronization may nihilite the gains of using multiple threads. For simple testing scenarios this is often the case. For example, in the above example, System.out
is synchronized. So, effectively, only number can be written at the same time, although multiple threads are used.
adding to @Hoopje 's answer:
Before using parallelStream ()
, Read this:
n
threads provides better performance that parallel streams.You can also read: Java Parallel Streams Are Bad for Your Health! | JRebel by Perforce
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With