Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I automatically add threads to a pool based on the computational needs of the program?

We have a C++ program which, depending on the way the user configures it, may be CPU bound or IO bound. For the purpose of loose coupling with the program configuration, I'd like to have my thread pool automatically realize when the program would benefit from more threads (i.e. CPU bound). It would be nice if it realized when it was I/O bound and reduced the number of workers, but that would just be a bonus (i.e. I'd be happy with something that just automatically grows without automatic shrinkage).

We use Boost so if there's something there that would help we can use it. I realize that any solution would probably be platform specific, so we're mainly interested in Windows and Linux, with a tertiary interest in OS X or any other *nix.

like image 302
Matt Chambers Avatar asked Apr 09 '15 20:04

Matt Chambers


People also ask

How do you implement a thread pool?

To use thread pools, we first create a object of ExecutorService and pass a set of tasks to it. ThreadPoolExecutor class allows to set the core and maximum pool size. The runnables that are run by a particular thread are executed sequentially.

Should you use a thread pool or just create a new thread whenever you need it?

I would advise you to use a ThreadPool instead of creating a new Thread. You can have a beginning of answer in Oracle's documentation about thread pools.

How do you implement a thread pool in Python?

futures module and its concrete subclass Executor, we can easily create a pool of threads. For this, we need to construct a ThreadPoolExecutor with the number of threads we want in the pool. By default, the number is 5. Then we can submit a task to the thread pool.


1 Answers

Short answer: use distinct fixed-size thread pools for CPU intensive operations and for IOs. In addition to the pool sizes, further regulation of the number of active threads will be done by the bounded-buffer (Producer/Consumer) that synchronizes the computer and IO steps of your workflow.

For compute- and data-intensive problems where the bottlenecks are a moving target between different resources (e.g. CPU vs IO), it can be useful to make a clear distinction between a thread and a thread, particularly, as a first approximation:

  • A thread that is created to use more CPU cycles ("CPU thread")
  • A thread that is created to handle an asynchronous IO operation ("IO thread")

More generally, threads should be segregated by the type of resources that they need. The aim should be to ensure that a single thread doesn't use more than one resource (e.g. avoiding switching between reading data and processing data in the same thread). When a tread uses more than one resource, it should be split and the two resulting threads should be synchronized through a bounded-buffer.

Typically there should be exactly as many CPU threads as needed to saturate the instruction pipelines of all the cores available on the system. To ensure that, simply have a "CPU thread pool" with exactly that many threads that are dedicated to computational work only. That would be boost:: or std::thread::hardware_concurrency() if that can be trusted. When the application needs less, there will simply be unused threads in the CPU thread pool. When it needs more, the work is queued. Instead of a "CPU thread pool", you could use c++11 std::async but you would need to implement a thread throttling mechanism with your selection of synchronization tools (e.g. a counting semaphore).

In addition to the "CPU thread pool", there can be another thread pool (or several other thread pools) dedicated to asynchronous IO operations. In your case, it seems that IO resource contention is potentially a concern. If that's the case (e.g. a local hard drive) the maximum number of threads should be carefully controlled (e.g. at most 2 read and 2 write threads on a local hard drive). This is conceptually the same as with CPU threads and you should have one fixed size thread pool for reading and another one for writing. Unfortunately, there will probably not be any good primitive available to decide on the size of these thread pools (measuring might be simple though, if your IO patterns are very regular). If resource contention is not an issue (e.g. NAS or small HTTP requests) then boost::asio or c++11 std::async would probably be a better option than a thread pool; in which case, thread throttling can be entirely left to the bounded-buffers.

like image 108
Come Raczy Avatar answered Sep 28 '22 22:09

Come Raczy