Akka docs states that default dispatcher is a fork-join-executor
because it "gives excellent performance in most cases".
I'm wondering why is it?
From ForkJoinPool
A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist). This enables (1) efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks), as well as (2) when many small tasks are submitted to the pool from external clients. Especially when setting asyncMode to true in constructors, ForkJoinPools may also be (3) appropriate for use with event-style tasks that are never joined.
At first, I guess that Akka is not an example of case (1) because I can't figure it out how Akka could be forking tasks, I mean, what would be the task that could be forked in many tasks?
I see each message as an independent task, that is why I think Akka is similar to case (2), where the messages are many small tasks being submitted (via ! and ?) to the ForkJoinPool
.
The next question, although not strictly related to akka, will be, why a use case where fork and join (main capabilities of ForkJoinPool
that allows work-stealing) are not being used still can be benefited by ForkJoinPool?
From Scalability of Fork Join Pool
We noticed that the number of context switches was abnormal, above 70000 per second.
That must be the problem, but what is causing it? Viktor came up with the qualified guess that it must be the task queue of the thread pool executor, since that is shared and the locks in the LinkedBlockingQueue could potentially generate the context switches when there is contention.
However, if it is true that Akka doesn't use ForkJoinTasks
, all tasks submitted by external clients will be queued in the shared queue, so the contention should be the same as in ThreadPoolExecutor
.
So, my questions are:
ForkJoinTasks
(case (1)) or is related to case (2)?ForkJoinPool
is beneficial in case (2) if all that tasks submitted by external clients will be pushed to a shared queue and no work-stealing will happen?Correct answer is the one from johanandren, however I want to add some highlights.
Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue. [...]
Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.
So, this enabled work stealing in scenarios where fork join weren't used. As Doug Lea says
Substantially better throughput when lots of clients submit lots of tasks. (I've measured up to 60X speedups on micro-benchmarks). The idea is to treat external submitters in a similar way as workers -- using randomized queuing and stealing. (This required a big internal refactoring to disassociate work queues and workers.) This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.
4% is indeed not much for FJP. There's still a trade-off you do with FJP which you need to be aware of: FJP keeps threads spinning for a while to be able to handle just-in-time arriving work faster. This ensures good latency in many cases. Especially if your pool is overprovisioned, however, the trade-off is a bit of latency against more power consumption in almost-idle situations.
ForkJoinPool It is an implementation of the ExecutorService that manages worker threads and provides us with tools to get information about the thread pool state and performance. Worker threads can execute only one task at a time, but the ForkJoinPool doesn't create a separate thread for every single subtask.
Akka gives developers a unified way to build scalable and fault-tolerant software that can scale up on multicore systems, and scale out in distributed computing environments, which today often means in the cloud.
The Fork/Join framework in Java 7 is an implementation of the Divide and Conquer algorithm, in which a central ForkJoinPool executes branching ForkJoinTasks. ExecutorService is an Executor that provides methods to manage the progress-tracking and termination of asynchronous tasks.
Its implementation restricts the maximum number of running threads to 32767 and attempting to create pools with greater than this size will result to IllegalArgumentException . The level of parallelism can also be controlled globally by setting java.
The FJP in Akka is run with asyncMode = true
so for the first question that is - having external clients submitting short/small async workloads. Each submitted workload is either dispatching an actor to process one or a few messages from its inbox but it is also used to execute Scala Future
operations.
When a non-ForkJoinTask
is scheduled to run on the FJP, it is adapted to a FJP and enqueued just like ForkJoinTask
s. There's isn't a single submission where tasks are queued (there was in an early version, JDK7 perhaps), there are many, to avoid contention, and an idle thread can pick (steal) tasks from other queues than its own if that is empty.
Note that by default we are currently running on a forked version of the Java 8 FJP, as we saw significant decrease in throughput with the Java 9 FJP when that came (it contains quite a bit of changes). Here's the issue #21910 discussing that if you are interested. Additionally, if you want to play around with benchmarking different pools you can find a few *Pool
benchmarks here: https://github.com/akka/akka/tree/master/akka-bench-jmh/src/main/scala/akka/actor
http://letitcrash.com/post/17607272336/scalability-of-fork-join-pool
Scalability of Fork Join Pool
Akka 2.0 message passing throughput scales way better on multi-core hardware than in previous versions, thanks to the new fork join executor developed by Doug Lea. One micro benchmark illustrates a 1100% increase in throughput!
...
http://cs.oswego.edu/pipermail/concurrency-interest/2012-January/008987.html
...
Highlights:
These improvements also lead to a less hostile stance about submitting possibly-blocking tasks. An added parag in the ForkJoinTask documentation provides some guidance (basically: we like them if they are small (even if numerous) and don't have dependencies).
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With