Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ThreadPoolExecutor vs ForkJoinPool: stealing subtasks

From java docs,

A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute subtasks created by other active tasks (eventually blocking waiting for work if none exist).

This enables efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks). When setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined.

After going through below ForkJoinPool example, Unlike ThreadPoolExecutor, I have not seen parameter to set Queue size. I did not get clue on how ForkJoinPool stealing mechanism.

//creating the ThreadPoolExecutor

ThreadPoolExecutor executorPool = new ThreadPoolExecutor(2, 10, 60, TimeUnit.SECONDS, 
new ArrayBlockingQueue<Runnable>(3000), threadFactory, rejectionHandler);

Assume that I have created ThreadPoolExecutor with 10 threads and 3000 Callable tasks have been submitted. How these threads share the load of execution of sub tasks?

And How ForkJoin pool behaves differently for same use case?

like image 274
Ravindra babu Avatar asked Oct 31 '15 05:10

Ravindra babu


People also ask

What is the main difference between the executor framework and ForkJoinPool?

The Fork/Join framework in Java 7 is an implementation of the Divide and Conquer algorithm, in which a central ForkJoinPool executes branching ForkJoinTasks. ExecutorService is an Executor that provides methods to manage the progress-tracking and termination of asynchronous tasks.

What is a ThreadPoolExecutor and why is it necessary?

ThreadPoolExecutor is an ExecutorService to execute each submitted task using one of possibly several pooled threads, normally configured using Executors factory methods. It also provides various utility methods to check current threads statistics and control them.

How do you prevent ThreadPoolExecutor?

You can call the cancel() function on the Future object to cancel the task before it has started running. If your task has already started running, then calling cancel() will have no effect and you must wait for the task to complete.


2 Answers

If you have 3000 tasks in advance, and they are not going to spawn other tasks, the two will not behave substantially differently: with 10 threads, 10 tasks will be run at a time until they are all done.

ForkJoinPool is designed for the case where you have one or a few tasks to start with, but the tasks know how to split themselves up into subtasks. In this situation, ForkJoinPool is optimized to permit tasks to check on the availability of processing threads and split themselves up appropriately.

like image 143
Warren Dew Avatar answered Oct 17 '22 12:10

Warren Dew


In ForkJoinPool, there are two kinds of queues — the pool one which you basically used when submitting a task, and the thread specific one (i.e. one for each thread). From a ForkJoinTask you can invoke new tasks (generally a split of your problem).

These new tasks are not offered to the pool queue but to the thread specific one. Thus, they are taken/pulled in priority to the pool one, as if you have done all the job in the same task. Furthermore, the invoker task appears to be blocked for subtask completion.

In reality, the "blocked time" is spent to consume subtasks. It will be stupid to let other threads "to loaf around" while one of them is flooded by work. So, "work stealing" takes place.

To go beyond. To be efficient, "work stealing" takes/pulls task from the opposite bound. This greatly reduces contention over queue writing.

Always in efficiency, it's better to only split the problem in two subtasks and let the subtask split again and again. Even if you know the problem must be split directly in N parts. This is because "work stealing" requires concurrent writes to a shared resource, so limit its activation and contention!

like image 35
LoganMzz Avatar answered Oct 17 '22 13:10

LoganMzz