Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In what situation do we need to use `multiprocessing.Pool.imap_unordered`?

Tags:

python

The ordering of results from the returned iterator of imap_unordered is arbitrary, and it doesn't seem to run faster than imap(which I check with the following code), so why would one use this method?

from multiprocessing import Pool import time  def square(i):     time.sleep(0.01)     return i ** 2  p = Pool(4) nums = range(50)  start = time.time() print 'Using imap' for i in p.imap(square, nums):     pass print 'Time elapsed: %s' % (time.time() - start)  start = time.time() print 'Using imap_unordered' for i in p.imap_unordered(square, nums):     pass print 'Time elapsed: %s' % (time.time() - start) 
like image 802
satoru Avatar asked Sep 28 '13 04:09

satoru


People also ask

When would you use a multiprocessing pool?

Use the multiprocessing. Pool class when you need to execute tasks that may or may not take arguments and may or may not return a result once the tasks are complete. Use the multiprocessing. Pool class when you need to execute different types of ad hoc tasks, such as calling different target task functions.

What is the purpose of the process multiprocessing?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

Is Imap_unordered faster?

That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values.

How do processes pools work in multiprocessing?

Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.


Video Answer


2 Answers

Using pool.imap_unordered instead of pool.imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.

What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).

Try making your work function sleep for i*0.1 seconds, shuffling your input list and printing i in your loops. You'll be able to see the difference between the two imap versions. Here's my version (the main function and the if __name__ == '__main__' boilerplate was is required to run correctly on Windows):

from multiprocessing import Pool import time import random  def work(i):     time.sleep(0.1*i)     return i  def main():     p = Pool(4)     nums = range(50)     random.shuffle(nums)      start = time.time()     print 'Using imap'     for i in p.imap(work, nums):         print i     print 'Time elapsed: %s' % (time.time() - start)      start = time.time()     print 'Using imap_unordered'     for i in p.imap_unordered(work, nums):         print i     print 'Time elapsed: %s' % (time.time() - start)  if __name__ == "__main__":     main() 

The imap version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.

like image 117
Blckknght Avatar answered Sep 25 '22 11:09

Blckknght


imap_unordered also seems to use less memory over time than imap. At least that's what I experienced with a iterator over millions of things.

like image 20
Ed Summers Avatar answered Sep 25 '22 11:09

Ed Summers