Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is ThreadPoolExecutor's default max_workers decided based on the number of CPUs?

The documentation for concurrent.futures.ThreadPoolExecutor says:

Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

I want to understand why the default max_workers value depends on the number of CPUs. Regardless of how many CPUs I have, only one Python thread can run at any point in time.

Let us assume each thread is I/O intensive and it spends only 10% of its time in the CPU and 90% of its time waiting for I/O. Let us then assume we have 2 CPUs. We can only run 10 threads to utilize 100% CPU. We can't utilize any more CPU because only one thread runs at any point in time. This holds true even if there are 4 CPUs.

So why is the default max_workers decided based on the number of CPUs?

like image 458
Lone Learner Avatar asked May 18 '19 03:05

Lone Learner


People also ask

How does executor map work?

The ThreadPoolExecutor map() function supports target functions that take more than one argument by providing more than one iterable as arguments to the call to map(). For example, we can define a target function for map that takes two arguments, then provide two iterables to the call to map().

How do you set Max workers in ThreadPoolExecutor?

You can configure the number of worker threads in the ThreadPoolExecutor by setting the max_workers in the constructor. If you are using the context manager to create the thread pool so that it is automatically shutdown, then you can configure the number of threads in the same manner.

How does ThreadPoolExecutor work in Python?

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.

How do you get results from ThreadPoolExecutor?

You can get results from the ThreadPoolExecutor in the order that tasks are completed by calling the as_completed() module function. The function takes a collection of Future objects and will return the same Future objects in the order that their associated tasks are completed.


1 Answers

It's a lot easier to check the number of processors than to check how I/O bound your program is, especially at thread pool startup, when your program hasn't really started working yet. There isn't really anything better to base the default on.

Also, adding the default was a pretty low-effort, low-discussion change. (Previously, there was no default.) Trying to get fancy would have been way more work.

That said, getting fancier might pay off. Maybe some kind of dynamic system that adjusts thread count based on load, so you don't have to decide the count at the time when you have the least information. It won't happen unless someone writes it, though.

like image 161
user2357112 supports Monica Avatar answered Oct 10 '22 17:10

user2357112 supports Monica