Consider very simple code:
#!/usr/bin/python
from multiprocessing import Pool
import random
def f(x):
return x*x
def sampleiter(n):
num = 0
while num < n:
rand = random.random()
yield rand
num += 1
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
for item in pool.imap_unordered(f, sampleiter(100000000000000), 20):
print item
pool.close
While running in the terminal, Python leaking memory.
What could be wrong?
imap_unordered instead of pool. imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much. What it may do, however, is make the interval between values being available in your iteration more even.
An iMAP is specific to prices advertised online, but an eMAP includes all electronic communication channels—even text messaging. MAP pricing is broader than these two policies, and covers all print, physical, and digital channels a retailer may use to list pricing.
map() function, the Pool. imap() function will iterate the provided iterable one item at a time and issue tasks to the process pool. It will also yield return values as tasks are completed rather than all at once after all tasks are completed.
We can show progress of tasks in the process pool using the callback function. This can be achieved by issuing tasks asynchronously to the process pool, such as via the apply_async() function and specifying a callback function via the “callback” argument.
Output buffering isn't the problem (or at least, not the only one), because (a) the Python process itself grows in memory, and (b) if you redirect to /dev/null
it still happens.
I think the issue is that when you print out the results, the pool is returning results much faster than they can be consumed, and so lots and lots of results are sitting in memory. If you look at the source of the class that does this, intermediate results are stored in the collections.deque
called _items
; I'd wager that _items
is getting huge.
I'm not entirely sure how to test this, though, because even though imap_unordered
returns an instance of this class you still seem to only be able to get at the generator methods:
In [8]: r = pool.imap_unordered(f, sampleiter(1e8), 20)
In [9]: print dir(r)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__',
'__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']
Update: if you add a time.sleep(.01)
to f()
, memory usage stays completely constant. So, yeah, the problem is that you're producing results faster than you can use them.
(As an aside: you mean pool.close()
at the end of your code sample; pool.close
is just a reference to the function and doesn't actually call it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With