Refering to Java's Fork/Join vs ExecutorService - when to use which?, a traditional thread pool is usually used to process many independent requests; and a <code>ForkJoinPool</code> is used to process coherent/recursive tasks, where a task may spawn another subtask and join on it later. So, why does Java-8's <code>parallelStream</code> use <code>ForkJoinPool</code> by default but not a traditional executor? In many cases, we use <code>forEach()</code> after a <code>stream()</code> or <code>parallelStream()</code> and then submit a functional interface as an argument. From my point of view, these tasks are independent, aren't they?

One important thing is that a <code>ForkJoinPool</code> can execute "normal" tasks (e.g. <code>Runnable</code>, <code>Callable</code>) as well, so it's not just meant to be used with recursively-created tasks. Another (important) thing is that <code>ForkJoinPool</code> has multiple queues, one for each worker thread, for the tasks, where a normal executor (e.g. <code>ThreadPoolExecutor</code>) has just one. This has much impact on what kind of tasks they should run. The smaller and the more tasks a normal executor has to execute, the higher is the overhead of synchronization for distributing tasks to the workers. If most of the tasks are small, the workers will access the internal task queue often, which leads to synchronization overhead. Here's where the <code>ForkJoinPool</code> shines with its multiple queues. Every worker just takes tasks from its own queue, which doesn't need to be synchronized by blocking most of the time, and if it's empty, it can steal a task from another worker, but from the other end of the queue, which also leads rarely to synchronization overhead as work-stealing is supposed to be rather rare. Now what does that have to do with parallel streams? The streams-framework is designed to be easy to use. Parallel streams are supposed to be used when you want to split something up in many concurrent tasks easily, where all tasks are rather small and simple. Here's the point where the <code>ForkJoinPool</code> is the reasonable choice. It provides the better performance on huge numbers of smaller tasks and it can handle longer tasks as well, if it has to.

Why does parallelStream use a ForkJoinPool, not a normal thread pool?

Tags:

concurrency

parallel-processing

java-stream

threadpool

forkjoinpool

Refering to Java's Fork/Join vs ExecutorService - when to use which?, a traditional thread pool is usually used to process many independent requests; and a ForkJoinPool is used to process coherent/recursive tasks, where a task may spawn another subtask and join on it later.

So, why does Java-8's parallelStream use ForkJoinPool by default but not a traditional executor?

In many cases, we use forEach() after a stream() or parallelStream() and then submit a functional interface as an argument. From my point of view, these tasks are independent, aren't they?

308

asked May 13 '20 03:05

Michael Ouyang

1 Answers

One important thing is that a ForkJoinPool can execute "normal" tasks (e.g. Runnable, Callable) as well, so it's not just meant to be used with recursively-created tasks.

Another (important) thing is that ForkJoinPool has multiple queues, one for each worker thread, for the tasks, where a normal executor (e.g. ThreadPoolExecutor) has just one. This has much impact on what kind of tasks they should run.

The smaller and the more tasks a normal executor has to execute, the higher is the overhead of synchronization for distributing tasks to the workers. If most of the tasks are small, the workers will access the internal task queue often, which leads to synchronization overhead.

Here's where the ForkJoinPool shines with its multiple queues. Every worker just takes tasks from its own queue, which doesn't need to be synchronized by blocking most of the time, and if it's empty, it can steal a task from another worker, but from the other end of the queue, which also leads rarely to synchronization overhead as work-stealing is supposed to be rather rare.

Now what does that have to do with parallel streams? The streams-framework is designed to be easy to use. Parallel streams are supposed to be used when you want to split something up in many concurrent tasks easily, where all tasks are rather small and simple. Here's the point where the ForkJoinPool is the reasonable choice. It provides the better performance on huge numbers of smaller tasks and it can handle longer tasks as well, if it has to.

answered Oct 17 '22 21:10

akuzminykh

Related questions
                            
                                Does the number of celeryd processes depend on the --concurrency setting?
                            
                                Making a "modify-while-enumerating" collection thread-safe
                            
                                process a file line by line in concurrency way
                            
                                Java's FutureTask composability
                            
                                Methods ready and result of Future in Scala cannot be called directly
                            
                                what's the purpose of compiler barrier?
                            
                                Can't get multiprocessing to run processes concurrently
                            
                                How to unit test an Akka actor that sends a message to itself, without using Thread.sleep
                            
                                Trying to understand the mechanics of a synchronous queue
                            
                                Guarding the initialization of a non-volatile field with a lock?
                            
                                Java - How to Create a MultiThreaded Game using SwingWorker
                            
                                How to use Applicative for concurrency?
                            
                                Out-of-order execution and reordering: can I see what after barrier before the barrier?
                            
                                executorService.scheduleAtFixedRate to run task forever
                            
                                Etags used in RESTful APIs are still susceptible to race conditions
                            
                                What's the relationship between forkOn and the -qm RTS flag?
                            
                                Does `isync` prevent Store-Load reordering on CPU PowerPC?
                            
                                Event Sourcing: concurrently creating conflicting events
                            
                                AWS Lambda async code execution
                            
                                How to identify if cancelled ScheduledFuture is actually not cancelled?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With