In what situation do we need to use `multiprocessing.Pool.imap_unordered`?

Tags:

python

The ordering of results from the returned iterator of imap_unordered is arbitrary, and it doesn't seem to run faster than imap(which I check with the following code), so why would one use this method?

from multiprocessing import Pool import time  def square(i):     time.sleep(0.01)     return i ** 2  p = Pool(4) nums = range(50)  start = time.time() print 'Using imap' for i in p.imap(square, nums):     pass print 'Time elapsed: %s' % (time.time() - start)  start = time.time() print 'Using imap_unordered' for i in p.imap_unordered(square, nums):     pass print 'Time elapsed: %s' % (time.time() - start)

802

asked Sep 28 '13 04:09

satoru

Video Answer

2 Answers

Using pool.imap_unordered instead of pool.imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.

What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).

Try making your work function sleep for i*0.1 seconds, shuffling your input list and printing i in your loops. You'll be able to see the difference between the two imap versions. Here's my version (the main function and the if __name__ == '__main__' boilerplate was is required to run correctly on Windows):

from multiprocessing import Pool import time import random  def work(i):     time.sleep(0.1*i)     return i  def main():     p = Pool(4)     nums = range(50)     random.shuffle(nums)      start = time.time()     print 'Using imap'     for i in p.imap(work, nums):         print i     print 'Time elapsed: %s' % (time.time() - start)      start = time.time()     print 'Using imap_unordered'     for i in p.imap_unordered(work, nums):         print i     print 'Time elapsed: %s' % (time.time() - start)  if __name__ == "__main__":     main()

The imap version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.

117

answered Sep 25 '22 11:09

Blckknght

imap_unordered also seems to use less memory over time than imap. At least that's what I experienced with a iterator over millions of things.

answered Sep 25 '22 11:09

Ed Summers

Related questions
                            
                                Python code performance decreases with threading
                            
                                Python's `concurrent.futures`: Iterate on futures according to order of completion
                            
                                Can PyCharm's optimize imports also alphabetize them?
                            
                                How to get back an overridden python built-in function?
                            
                                The 'sphinx-build' command was not found.
                            
                                Max recursion is not exactly what sys.getrecursionlimit() claims. How come?
                            
                                NLTK vs Stanford NLP
                            
                                'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
                            
                                How can I clear a model created with Keras and Tensorflow(as backend)?
                            
                                How to use Python variables in Google Colab terminal command?
                            
                                How to apply decorators to lambdas?
                            
                                Order of keys in dictionary
                            
                                SQLAlchemy: selecting which columns of an object in a query
                            
                                Mails not being sent to people in CC
                            
                                Safely extract zip or tar using Python
                            
                                Catching exception in context manager __enter__()
                            
                                How do I customize text color in IPython?
                            
                                How to check if a specific integer is in a list
                            
                                Adding a background image to a plot
                            
                                Django logging on Heroku

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With