The ordering of results from the returned iterator of imap_unordered
is arbitrary, and it doesn't seem to run faster than imap
(which I check with the following code), so why would one use this method?
from multiprocessing import Pool import time def square(i): time.sleep(0.01) return i ** 2 p = Pool(4) nums = range(50) start = time.time() print 'Using imap' for i in p.imap(square, nums): pass print 'Time elapsed: %s' % (time.time() - start) start = time.time() print 'Using imap_unordered' for i in p.imap_unordered(square, nums): pass print 'Time elapsed: %s' % (time.time() - start)
Use the multiprocessing. Pool class when you need to execute tasks that may or may not take arguments and may or may not return a result once the tasks are complete. Use the multiprocessing. Pool class when you need to execute different types of ad hoc tasks, such as calling different target task functions.
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values.
Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.
Using pool.imap_unordered
instead of pool.imap
will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.
What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01
seconds you were using in your example), imap_unordered
can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap
will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).
Try making your work function sleep for i*0.1
seconds, shuffling your input list and printing i
in your loops. You'll be able to see the difference between the two imap
versions. Here's my version (the main
function and the if __name__ == '__main__'
boilerplate was is required to run correctly on Windows):
from multiprocessing import Pool import time import random def work(i): time.sleep(0.1*i) return i def main(): p = Pool(4) nums = range(50) random.shuffle(nums) start = time.time() print 'Using imap' for i in p.imap(work, nums): print i print 'Time elapsed: %s' % (time.time() - start) start = time.time() print 'Using imap_unordered' for i in p.imap_unordered(work, nums): print i print 'Time elapsed: %s' % (time.time() - start) if __name__ == "__main__": main()
The imap
version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered
loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.
imap_unordered also seems to use less memory over time than imap. At least that's what I experienced with a iterator over millions of things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With