Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When To Create New ForkJoinPool and When To Use CommonPool?

I was reading on threading and learned about fork/join API.

I found that you can either run threads with the commonPool being the default pool managing the threads, or I can submit the threads to a newly created ForkJoinPool.

The difference between the two is as follows, to my understanding:

  • The commonPool is the main pool created statically (where some pool methods don't work as they normally do with other pools like shutting it down), and is used mainly for the application to run.
  • The number of parallelism in the default/commonPool is the number of cores - 1, where the default number of parallelism of a newly created pool = number of cores (or the number specified by system property parallelism - I'm ignoring the fully qualified system property key name -).

Based on the documentation, the commonPool is fine for most uses.

This all boils down to my question:

When should I use the common pool? And why so? When should I create a new pool? And why so?

like image 664
joker Avatar asked Aug 28 '19 19:08

joker


1 Answers

Short Story

The answer, like most things in software engineering, is: "It depends".

Pros of using the common pool

If you look at this wonderful article:

According to Oracle’s documentation, using the predefined common pool reduces resource consumption, since this discourages the creation of a separate thread pool per task.

and

Using the fork/join framework can speed up processing of large tasks, but to achieve this outcome, some guidelines should be followed:

  • Use as few thread pools as possible – in most cases, the best decision is to use one thread pool per application or system
  • Use the default common thread pool, if no specific tuning is needed
  • Use a reasonable threshold for splitting ForkJoingTask into subtasks
  • Avoid any blocking in your ForkJoingTasks

Pros of using dedicated pools

However, there are also some arguments AGAINST following this approach:

Dedicated Pool for Complex Applications

Having a dedicated pool per logical working unit in a complex application is sometimes the preferred approach. Imagine an application that:

  1. Takes in a lot of events and groups them (that can be done in parallel)
  2. Then workers do the work (that can be done in parallel as well)
  3. Finally, some cleanup workers do some cleanup (that can be done in parallel as well).

So your application has 3 logical work groups each of which might have its own demands for parallelism. (Keep in mind that this pool has parallelism set to something fairly low on most machines)

Better to not step on each other's toes, right? Note that this can scale up to a certain level, where it's recommended to have a separate microservice for each of these work units, but if for one reason or another you are not there already, then a dedicated forkJoinPool per logical work unit is not a bad idea.


Other libraries

If your app's code has only one place where you want parallelism, you don't have a guarantee that some developer wouldn't pull some 3-rd party dependency which also relies on the common ForkJoinPool, and you still have two places where this pool is in demand. That might be okay for your use case, and it might not be, especially if your default pool's parallelism is 4 or below.

Imagine the situation when your app critical code (e.g event handling or saving data to a database) is having to compete for the common pool with some library which exports logs in parallel to some log sink.


Dedicated ForkJoinPool Makes Logging Neater

Additionally, the common forkJoinPool has a rather non-descriptive naming so if you are debugging or looking at logs, chances are you will have to sift through a ton of

ForkJoinPool.commonPool-worker-xx

In the situation described above, compare that with:

ForkJoinPool.grouping-worker-xx

ForkJoinPool.payload-handler-worker-xx

ForkJoinPool.cleanup-worker

Therefore you can see there is some benefit in logging cleanness when using a dedicated ForkJoinPool per logical work group.


TL;DR

Using the common ForkJoinPool has lower memory impact, less resources and thread creation and lower garbage collection demands. However, this approach might be insufficient for some use cases, as pointed above.

Using a dedicated ForkJoinPool per logical work unit in your application provides neater logging, is not a bad idea to use when you have low parallelism level (i.e not many cores), and when you want to avoid thread contention between logically different parts of your application. This, however, comes at a price of higher cpu utilization, higher memory overhead, and more thread creation.

like image 141
Nikola Yovchev Avatar answered Sep 19 '22 00:09

Nikola Yovchev