Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python itertools with multiprocessing - huge list vs inefficient CPUs usage with iterator

I work on n elements (named "pair" below) variations with repetition used as my function's argument. Obviously everything works fine as long as the "r" list is not big enough to consume all the memory. The issue is I have to make more then 16 repetitions for 6 elements eventually. I use 40 cores system in cloud for this.

The code looks looks like the following:

if __name__ == '__main__':
  pool = Pool(39)
  r = itertools.product(pairs,repeat=16)
  pool.map(f, r)

I believe i should use iterator instead of creating the huge list upfront and here the problem starts..

I tried to solve the issue with the following code:

if __name__ == '__main__':
  pool = Pool(39)
  for r in itertools.product(pairs,repeat=14):
    pool.map(f, r)

The memory problem goes away but the CPUs usage is like 5% per core. Now the single core version of the code is faster then this.

I'd really appreciate if you could guide me a bit..

Thanks.

like image 393
xis_one Avatar asked Aug 08 '16 22:08

xis_one


1 Answers

Your original code isn't creating a list upfront in your own code (itertools.product returns a generator), but pool.map is realizing the whole generator (because it assumes if you can store all outputs, you can store all inputs too).

Don't use pool.map here. If you need ordered results, using pool.imap, or if result order is unimportant, use pool.imap_unordered. Iterate the result of either call (don't wrap in list), and process the results as they come, and memory should not be an issue:

if __name__ == '__main__':
    pool = Pool(39)
    for result in pool.imap(f, itertools.product(pairs, repeat=16)):
        print(result)

If you're using pool.map for side-effects, so you just need to run it to completion but the results and ordering don't matter, you could dramatically improve performance by using imap_unordered and using collections.deque to efficiently drain the "results" without actually storing anything (a deque with maxlen of 0 is the fastest, lowest memory way to force an iterator to run to completion without storing the results):

from collections import deque

if __name__ == '__main__':
    pool = Pool(39)
    deque(pool.imap_unordered(f, itertools.product(pairs, repeat=16)), 0)

Lastly, I'm a little suspicious of specifying 39 Pool workers; multiprocessing is largely beneficial for CPU bound tasks; if you're using using more workers than you have CPU cores and gaining a benefit, it's possible multiprocessing is costing you more in IPC than it gains, and using more workers is just masking the problem by buffering more data.

If your work is largely I/O bound, you might try using a thread based pool, which will avoid the overhead of pickling and unpickling, as well as the cost of IPC between parent and child processes. Unlike process based pools, Python threading is subject to GIL issues, so your CPU bound work in Python (excluding GIL releasing calls for I/O, ctypes calls into .dll/.so files, and certain third party extensions like numpy that release the GIL for heavy CPU work) is limited to a single core (and in Python 2.x for CPU bound work you often waste a decent amount of that resolving GIL contention and performing context switches; Python 3 removes most of the waste). But if your work is largely I/O bound, blocking on I/O releases the GIL to allow other threads to run, so you can have many threads as long as most of them delay on I/O. It's easy to switch too (as long as you haven't designed your program to rely on separate address spaces for each worker by assuming you can write to "shared" state and not affect other workers or the parent process), just change:

from multiprocessing import Pool

to:

from multiprocessing.dummy import Pool

and you get the multiprocessing.dummy version of the pool, based on threads instead of processes.

like image 184
ShadowRanger Avatar answered Oct 17 '22 04:10

ShadowRanger