I have a program where I am currently using a concurrent.futures.ThreadPoolExecutor to run multiple tasks concurrently. These tasks are typically I/O bound, involving access to local databases and remote REST APIs. However, these tasks could themselves be split into subtasks, which would also benefit from concurrency.
What I am hoping is that it is safe to use a concurrent.futures.ThreadPoolExecutor within the tasks. I have coded up a toy example, which seems to work:
import concurrent.futures def inner(i, j): return i, j, i**j def outer(i): with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: futures = {executor.submit(inner, i, j): j for j in range(5)} results = [] for future in concurrent.futures.as_completed(futures): results.append(future.result()) return results def main(): with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: futures = {executor.submit(outer, i): i for i in range(10)} results = [] for future in concurrent.futures.as_completed(futures): results.extend(future.result()) print(results) if __name__ == "__main__": main()
Although this toy example seems to work, I'd like some confidence that this is intentional. I would hope it is, because otherwise it would not be safe to use the executor to execute arbitrary code, in case it also used concurrent.futures to exploit concurrency.
The concurrent. futures module provides a high-level interface for asynchronously executing callables. The asynchronous execution can be performed with threads, using ThreadPoolExecutor , or separate processes, using ProcessPoolExecutor .
ThreadPoolExecutor Can Be Slower for CPU-Bound Tasks Using the ThreadPoolExecutor for a CPU-bound task can be slower than not using it. This is because Python threads are constrained by the Global Interpreter Lock, or GIL.
ThreadPoolExecutor is an ExecutorService to execute each submitted task using one of possibly several pooled threads, normally configured using Executors factory methods. It also provides various utility methods to check current threads statistics and control them.
The ThreadPoolExecutor allows you to create and manage thread pools in Python. Although the ThreadPoolExecutor has been available since Python 3.2, it is not widely used, perhaps because of misunderstandings of the capabilities and limitations of Threads in Python.
There is absolutely no issue with spawning threads from other threads. Your case is no different.
Sooner or later though, the overhead of spawning threads will be quite high, and spawning more threads will actually cause your software to slow down.
I highly suggest using a library like asyncio which beautifully handles tasks asynchronously. It does so by using one thread with non-blocking io. The results will probably be even faster than with normal threads, as the overhead is much less significant.
If you do not wish to use asyncio, why not create another pool executor inside main, and pass it on to the outer()
function? This way, instead of 25 (5x5) threads, you will have a maximum of 10 (2x5) which is much more reasonable?
You cannot pass the same main()
executor which calls outer()
to outer()
as it might cause a deadlock (by each outer()
waiting for another outer()
to finish before they can schedule inner()
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With