Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteration over pool.imap_unordered

Consider very simple code:

#!/usr/bin/python

from multiprocessing import Pool
import random

def f(x):
    return x*x

def sampleiter(n):
    num = 0
    while num < n:
     rand = random.random()
     yield rand
     num += 1

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    for item in pool.imap_unordered(f, sampleiter(100000000000000), 20):
     print item
    pool.close

While running in the terminal, Python leaking memory.
What could be wrong?

like image 267
user1291515 Avatar asked Mar 25 '12 16:03

user1291515


People also ask

Is Imap_unordered faster than IMAP?

imap_unordered instead of pool. imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much. What it may do, however, is make the interval between values being available in your iteration more even.

What is the difference between MAP and IMAP?

An iMAP is specific to prices advertised online, but an eMAP includes all electronic communication channels—even text messaging. MAP pricing is broader than these two policies, and covers all print, physical, and digital channels a retailer may use to list pricing.

What is pool IMAP?

map() function, the Pool. imap() function will iterate the provided iterable one item at a time and issue tasks to the process pool. It will also yield return values as tasks are completed rather than all at once after all tasks are completed.

How does Python check multiprocessing progress?

We can show progress of tasks in the process pool using the callback function. This can be achieved by issuing tasks asynchronously to the process pool, such as via the apply_async() function and specifying a callback function via the “callback” argument.


1 Answers

Output buffering isn't the problem (or at least, not the only one), because (a) the Python process itself grows in memory, and (b) if you redirect to /dev/null it still happens.

I think the issue is that when you print out the results, the pool is returning results much faster than they can be consumed, and so lots and lots of results are sitting in memory. If you look at the source of the class that does this, intermediate results are stored in the collections.deque called _items; I'd wager that _items is getting huge.

I'm not entirely sure how to test this, though, because even though imap_unordered returns an instance of this class you still seem to only be able to get at the generator methods:

In [8]: r = pool.imap_unordered(f, sampleiter(1e8), 20)

In [9]: print dir(r)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__',
 '__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']

Update: if you add a time.sleep(.01) to f(), memory usage stays completely constant. So, yeah, the problem is that you're producing results faster than you can use them.

(As an aside: you mean pool.close() at the end of your code sample; pool.close is just a reference to the function and doesn't actually call it.)

like image 193
Danica Avatar answered Sep 18 '22 02:09

Danica