Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multithreading max_workers

According to the documentation of ThreadPoolExecutor

If max_workers is None or not given, it will default to the number of processors on the machine.

If I don't set it a value like this

ThreadPoolExecutor(max_workers=None)

is it bad for performance in case that my value is very low (2) ? Will python already allocate all the CPU processes for None value vs allocate only 2 for value with a number?

like image 530
Dejell Avatar asked Nov 08 '16 17:11

Dejell


People also ask

Is ThreadPoolExecutor thread safe Python?

ThreadPoolExecutor Thread-Safety Although the ThreadPoolExecutor uses threads internally, you do not need to work with threads directly in order to execute tasks and get results. Nevertheless, when accessing resources or critical sections, thread-safety may be a concern.

Is Python truly multithreaded?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

How do you create a Threadpool in Python?

How to create a ThreadPoolExecutor? With the help of concurrent. futures module and its concrete subclass Executor, we can easily create a pool of threads. For this, we need to construct a ThreadPoolExecutor with the number of threads we want in the pool.

How does ThreadPoolExecutor work in Python?

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.


1 Answers

To begin with, you seem to be quoting the wrong part of the documentation in your link, namely the one for processes, not threads. The one for concurrent.futures.ThreadPoolExecutor states:

Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.


Since you're using threads, not processes, the assumption is that your application is IO bound, not CPU bound, and that you're using this for concurrency, not parallelism. The more threads you use, the higher concurrency you'll achieve (up to a point), but the less CPU cycles you'll get (as there will be context switches). You have to instrument your application under typical workloads to see what works best for you. There is no universally optimal solution for this.

like image 90
Ami Tavory Avatar answered Oct 19 '22 01:10

Ami Tavory