Refering to Java's Fork/Join vs ExecutorService - when to use which?, a traditional thread pool is usually used to process many independent requests; and a ForkJoinPool
is used to process coherent/recursive tasks, where a task may spawn another subtask and join on it later.
So, why does Java-8's parallelStream
use ForkJoinPool
by default but not a traditional executor?
In many cases, we use forEach()
after a stream()
or parallelStream()
and then submit a functional interface as an argument. From my point of view, these tasks are independent, aren't they?
In case of Parallel stream,4 threads are spawned simultaneously and it internally using Fork and Join pool to create and manage threads.
The Fork/Join framework in Java 7 is an implementation of the Divide and Conquer algorithm, in which a central ForkJoinPool executes branching ForkJoinTasks. ExecutorService is an Executor that provides methods to manage the progress-tracking and termination of asynchronous tasks.
2. Parallel Stream. The default processing that occurs in such a Stream uses the ForkJoinPool. commonPool(), a thread pool shared by the entire application.
A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.
One important thing is that a ForkJoinPool
can execute "normal" tasks (e.g. Runnable
, Callable
) as well, so it's not just meant to be used with recursively-created tasks.
Another (important) thing is that ForkJoinPool
has multiple queues, one for each worker thread, for the tasks, where a normal executor (e.g. ThreadPoolExecutor
) has just one. This has much impact on what kind of tasks they should run.
The smaller and the more tasks a normal executor has to execute, the higher is the overhead of synchronization for distributing tasks to the workers. If most of the tasks are small, the workers will access the internal task queue often, which leads to synchronization overhead.
Here's where the ForkJoinPool
shines with its multiple queues. Every worker just takes tasks from its own queue, which doesn't need to be synchronized by blocking most of the time, and if it's empty, it can steal a task from another worker, but from the other end of the queue, which also leads rarely to synchronization overhead as work-stealing is supposed to be rather rare.
Now what does that have to do with parallel streams? The streams-framework is designed to be easy to use. Parallel streams are supposed to be used when you want to split something up in many concurrent tasks easily, where all tasks are rather small and simple. Here's the point where the ForkJoinPool
is the reasonable choice. It provides the better performance on huge numbers of smaller tasks and it can handle longer tasks as well, if it has to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With