ProcessPoolExecutor from concurrent.futures way slower than multiprocessing.Pool

Tags:

I was experimenting with the new shiny concurrent.futures module introduced in Python 3.2, and I've noticed that, almost with identical code, using the Pool from concurrent.futures is way slower than using multiprocessing.Pool.

This is the version using multiprocessing:

def hard_work(n):     # Real hard work here     pass  if __name__ == '__main__':     from multiprocessing import Pool, cpu_count      try:         workers = cpu_count()     except NotImplementedError:         workers = 1     pool = Pool(processes=workers)     result = pool.map(hard_work, range(100, 1000000))

And this is using concurrent.futures:

def hard_work(n):     # Real hard work here     pass  if __name__ == '__main__':     from concurrent.futures import ProcessPoolExecutor, wait     from multiprocessing import cpu_count     try:         workers = cpu_count()     except NotImplementedError:         workers = 1     pool = ProcessPoolExecutor(max_workers=workers)     result = pool.map(hard_work, range(100, 1000000))

Using a naïve factorization function taken from this Eli Bendersky article, these are the results on my computer (i7, 64-bit, Arch Linux):

[juanlu@nebulae]─[~/Development/Python/test] └[10:31:10] $ time python pool_multiprocessing.py   real    0m10.330s user    1m13.430s sys 0m0.260s [juanlu@nebulae]─[~/Development/Python/test] └[10:31:29] $ time python pool_futures.py   real    4m3.939s user    6m33.297s sys 0m54.853s

I cannot profile these with the Python profiler because I get pickle errors. Any ideas?

601

asked Sep 07 '13 08:09

astrojuanlu

1 Answers

When using map from concurrent.futures, each element from the iterable is submitted separately to the executor, which creates a Future object for each call. It then returns an iterator which yields the results returned by the futures.
Future objects are rather heavyweight, they do a lot of work to allow all the features they provide (like callbacks, ability to cancel, check status, ...).

Compared to that, multiprocessing.Pool has much less overhead. It submits jobs in batches (reducing IPC overhead), and directly uses the result returned by the function. For big batches of jobs, multiprocessing is definitely the better options.

Futures are great if you want to sumbit long running jobs where the overhead isn't that important, where you want to be notified by callback or check from time to time to see if they're done or be able to cancel the execution individually.

Personal note:

I can't really think of much reasons to use Executor.map - it doesn't give you any of the features of futures - except for the ability to specify a timeout. If you're just interested in the results, you're better off using one of multiprocessing.Pool's map functions.

139

answered Sep 21 '22 19:09

mata

Related questions
                            
                                Python decorator as a staticmethod
                            
                                What are the URL parameters? (element at position #3 in urlparse result)
                            
                                save numpy array in append mode
                            
                                Using .pth files
                            
                                How to mock.patch a class imported in another module
                            
                                Requirements.txt greater than equal to and then less than?
                            
                                Python Error: "ValueError: need more than 1 value to unpack"
                            
                                Optional parameters in functions and their mutable default values [duplicate]
                            
                                Difference between np.int, np.int_, int, and np.int_t in cython?
                            
                                What is unicode_literals used for?
                            
                                Why does Python "preemptively" hang when trying to calculate a very large number?
                            
                                import function from a file in the same folder
                            
                                Cleanest & Fastest server setup for Django [closed]
                            
                                Sphinx autosummary "toctree contains reference to nonexisting document" warnings
                            
                                determine matplotlib axis size in pixels
                            
                                UTF-8 In Python logging, how?
                            
                                How do I get logger to delete existing log file before writing to it again?
                            
                                How do I align gridlines for two y-axis scales using Matplotlib?
                            
                                Efficiently count zero elements in numpy array?
                            
                                Image comparison algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ProcessPoolExecutor from concurrent.futures way slower than multiprocessing.Pool

Tags:

python

concurrency

multiprocessing

future

concurrent.futures

astrojuanlu

People also ask

1 Answers

mata

Recent Activity

Donate For Us