Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does parallelStream use a ForkJoinPool, not a normal thread pool?

Refering to Java's Fork/Join vs ExecutorService - when to use which?, a traditional thread pool is usually used to process many independent requests; and a ForkJoinPool is used to process coherent/recursive tasks, where a task may spawn another subtask and join on it later.

So, why does Java-8's parallelStream use ForkJoinPool by default but not a traditional executor?

In many cases, we use forEach() after a stream() or parallelStream() and then submit a functional interface as an argument. From my point of view, these tasks are independent, aren't they?

like image 308
Michael Ouyang Avatar asked May 13 '20 03:05

Michael Ouyang


People also ask

How many threads will be used when we use parallelStream ()?

In case of Parallel stream,4 threads are spawned simultaneously and it internally using Fork and Join pool to create and manage threads.

What is difference between ExecutorService and ForkJoinPool?

The Fork/Join framework in Java 7 is an implementation of the Divide and Conquer algorithm, in which a central ForkJoinPool executes branching ForkJoinTasks. ExecutorService is an Executor that provides methods to manage the progress-tracking and termination of asynchronous tasks.

Does parallel stream use thread pool?

2. Parallel Stream. The default processing that occurs in such a Stream uses the ForkJoinPool. commonPool(), a thread pool shared by the entire application.

What is the difference between stream () and parallelStream ()?

A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.


1 Answers

One important thing is that a ForkJoinPool can execute "normal" tasks (e.g. Runnable, Callable) as well, so it's not just meant to be used with recursively-created tasks.

Another (important) thing is that ForkJoinPool has multiple queues, one for each worker thread, for the tasks, where a normal executor (e.g. ThreadPoolExecutor) has just one. This has much impact on what kind of tasks they should run.

The smaller and the more tasks a normal executor has to execute, the higher is the overhead of synchronization for distributing tasks to the workers. If most of the tasks are small, the workers will access the internal task queue often, which leads to synchronization overhead.

Here's where the ForkJoinPool shines with its multiple queues. Every worker just takes tasks from its own queue, which doesn't need to be synchronized by blocking most of the time, and if it's empty, it can steal a task from another worker, but from the other end of the queue, which also leads rarely to synchronization overhead as work-stealing is supposed to be rather rare.

Now what does that have to do with parallel streams? The streams-framework is designed to be easy to use. Parallel streams are supposed to be used when you want to split something up in many concurrent tasks easily, where all tasks are rather small and simple. Here's the point where the ForkJoinPool is the reasonable choice. It provides the better performance on huge numbers of smaller tasks and it can handle longer tasks as well, if it has to.

like image 83
akuzminykh Avatar answered Oct 17 '22 21:10

akuzminykh