I work on n elements (named "pair" below) variations with repetition used as my function's argument. Obviously everything works fine as long as the "r" list is not big enough to consume all the memory. The issue is I have to make more then 16 repetitions for 6 elements eventually. I use 40 cores system in cloud for this.
The code looks looks like the following:
if __name__ == '__main__':
pool = Pool(39)
r = itertools.product(pairs,repeat=16)
pool.map(f, r)
I believe i should use iterator instead of creating the huge list upfront and here the problem starts..
I tried to solve the issue with the following code:
if __name__ == '__main__':
pool = Pool(39)
for r in itertools.product(pairs,repeat=14):
pool.map(f, r)
The memory problem goes away but the CPUs usage is like 5% per core. Now the single core version of the code is faster then this.
I'd really appreciate if you could guide me a bit..
Thanks.
Your original code isn't creating a list
upfront in your own code (itertools.product
returns a generator), but pool.map
is realizing the whole generator (because it assumes if you can store all outputs, you can store all inputs too).
Don't use pool.map
here. If you need ordered results, using pool.imap
, or if result order is unimportant, use pool.imap_unordered
. Iterate the result of either call (don't wrap in list
), and process the results as they come, and memory should not be an issue:
if __name__ == '__main__':
pool = Pool(39)
for result in pool.imap(f, itertools.product(pairs, repeat=16)):
print(result)
If you're using pool.map
for side-effects, so you just need to run it to completion but the results and ordering don't matter, you could dramatically improve performance by using imap_unordered
and using collections.deque
to efficiently drain the "results" without actually storing anything (a deque
with maxlen
of 0
is the fastest, lowest memory way to force an iterator to run to completion without storing the results):
from collections import deque
if __name__ == '__main__':
pool = Pool(39)
deque(pool.imap_unordered(f, itertools.product(pairs, repeat=16)), 0)
Lastly, I'm a little suspicious of specifying 39 Pool
workers; multiprocessing
is largely beneficial for CPU bound tasks; if you're using using more workers than you have CPU cores and gaining a benefit, it's possible multiprocessing
is costing you more in IPC than it gains, and using more workers is just masking the problem by buffering more data.
If your work is largely I/O bound, you might try using a thread based pool, which will avoid the overhead of pickling and unpickling, as well as the cost of IPC between parent and child processes. Unlike process based pools, Python threading is subject to GIL issues, so your CPU bound work in Python (excluding GIL releasing calls for I/O, ctypes
calls into .dll/.so files, and certain third party extensions like numpy
that release the GIL for heavy CPU work) is limited to a single core (and in Python 2.x for CPU bound work you often waste a decent amount of that resolving GIL contention and performing context switches; Python 3 removes most of the waste). But if your work is largely I/O bound, blocking on I/O releases the GIL to allow other threads to run, so you can have many threads as long as most of them delay on I/O. It's easy to switch too (as long as you haven't designed your program to rely on separate address spaces for each worker by assuming you can write to "shared" state and not affect other workers or the parent process), just change:
from multiprocessing import Pool
to:
from multiprocessing.dummy import Pool
and you get the multiprocessing.dummy
version of the pool, based on threads instead of processes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With