Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java ForkJoinPool with non-recursive tasks, does work-stealing work?

I want to submit Runnable tasks into ForkJoinPool via a method:

forkJoinPool.submit(Runnable task) 

Note, I use JDK 7.

Under the hood, they are transformed into ForkJoinTask objects. I know that ForkJoinPool is efficient when a task is split into smaller ones recursively.

Question:

Does work-stealing still work in the ForkJoinPool if there is no recursion?

Is it worth it in this case?

Update 1: Tasks are small and can be unbalanced. Even for strictly equal tasks, such things like context switching, thread scheduling, parking, pages misses etc. get in the way leading to the imbalance.

Update 2: Doug Lea wrote in the Concurrency JSR-166 Interest group, by giving a hint on this:

This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.

I presume, when it comes to reasonably small CPU-bound tasks, ForkJoinPool is the way to go, thanks to this optimization. The main point is that these tasks are already small and needn't a recursive decomposition. Work-stealing works, regardless whether it is a big or small task - tasks can be grabbed by another free worker from the Deque's tail of a busy worker.

Update 3: Scalability of ForkJoinPool - benchmarking by Akka team of ping-pong shows great results.

Despite this, to apply ForkJoinPool more efficiently requires performance tuning.

like image 418
Ivan Voroshilin Avatar asked May 05 '15 07:05

Ivan Voroshilin


People also ask

What is work stealing in Java?

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors (or cores).

How does ForkJoinPool work in Java?

ForkJoinPool acts recursively, unlike Executor threads, which splits the task and submits smaller chunks to worker Threads. ForkJoinPool takes a big task, splits it into smaller tasks, and those smaller tasks split themselves again into subtasks until each subtask is atomic or not divisible. So it works recursively.

What is difference between ExecutorService and ForkJoinPool?

The Fork/Join framework in Java 7 is an implementation of the Divide and Conquer algorithm, in which a central ForkJoinPool executes branching ForkJoinTasks. ExecutorService is an Executor that provides methods to manage the progress-tracking and termination of asynchronous tasks.

What is ForkJoinPool commonPool?

public static ForkJoinPool commonPool() Returns the common pool instance. This pool is statically constructed; its run state is unaffected by attempts to shutdown() or shutdownNow() . However this pool and any ongoing processing are automatically terminated upon program System.


1 Answers

ForkJoinPool source code has a nice section called "Implementation Overview", read up for an ultimate truth. The explanation below is my understanding for JDK 8u40.

Since day one, ForkJoinPool had a work queue per worker thread (let's call them "worker queues"). The forked tasks are pushed into the local worker queue, ready to be popped by the worker again and be executed -- in other words, it looks like a stack from worker thread perspective. When a worker depletes its worker queue, it goes around and tries to steal the tasks from other worker queues. That is "work stealing".

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue.

It changed significantly after then. After this submission queue was identified as the serious performance bottleneck, Doug Lea et al. striped the submission queues as well. In hindsight, that is an obvious idea: you can reuse most of the mechanics available for worker queues. You can even loosely distribute these submission queues per worker threads. Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

I have seen many workloads benefiting from this. This particular design advantage of ForkJoinPool even for plain non-recursive tasks was recognized a long ago. Many users at concurrency-interest@ asked for a simple work-stealing executor without all the ForkJoinPool arcanery. This is one of the reasons, why we have Executors.newWorkStealingPool() in JDK 8 onward -- currently delegating to ForkJoinPool, but open for providing a simpler implementation.

like image 114
Aleksey Shipilev Avatar answered Oct 04 '22 19:10

Aleksey Shipilev