I am using <code>multiprocessing.imap_unordered</code> to perform a computation on a list of values: <pre class="prettyprint"><code>def process_parallel(fnc, some_list): pool = multiprocessing.Pool() for result in pool.imap_unordered(fnc, some_list): for x in result: yield x pool.terminate() </code></pre> Each call to <code>fnc</code> returns a HUGE object as a result, by design. I can store N instances of such object in RAM, where N ~ cpu_count, but not much more (not hundreds). Now, using this function takes up too much memory. The memory is entirely spent in the main process, not in the workers. How does <code>imap_unordered</code> store the finished results? I mean the results that were already returned by workers but not yet passed on to user. I thought it was smart and only computed them "lazily" as needed, but apparently not. It looks like since I cannot consume the results of <code>process_parallel</code> fast enough, the pool keeps queueing these huge objects from <code>fnc</code> somewhere, internally, and then blows up. Is there a way to avoid this? Limit its internal queue somehow? <hr> I'm using Python2.7. Cheers.

As you can see by looking into the corresponding source file (<code>python2.7/multiprocessing/pool.py</code>), the IMapUnorderedIterator uses a <code>collections.deque</code> instance for storing the results. If a new item comes in, it is added and removed in the iteration. As you suggested, if another huge object comes in while the main thread is still processing the object, those will be stored in memory too. What you might try is something like this: <pre class="prettyprint"><code>it = pool.imap_unordered(fnc, some_list) for result in it: it._cond.acquire() for x in result: yield x it._cond.release() </code></pre> This should cause the task-result-receiver-thread to get blocked while you process an item if it is trying to put the next object into the deque. Thus there should not be more than two of the huge objects in memory. If that works for your case, I don't know ;)

Python's multiprocessing and memory

Tags:

I am using multiprocessing.imap_unordered to perform a computation on a list of values:

def process_parallel(fnc, some_list):
    pool = multiprocessing.Pool()
    for result in pool.imap_unordered(fnc, some_list):
        for x in result:
            yield x
    pool.terminate()

Each call to fnc returns a HUGE object as a result, by design. I can store N instances of such object in RAM, where N ~ cpu_count, but not much more (not hundreds).

Now, using this function takes up too much memory. The memory is entirely spent in the main process, not in the workers.

How does imap_unordered store the finished results? I mean the results that were already returned by workers but not yet passed on to user. I thought it was smart and only computed them "lazily" as needed, but apparently not.

It looks like since I cannot consume the results of process_parallel fast enough, the pool keeps queueing these huge objects from fnc somewhere, internally, and then blows up. Is there a way to avoid this? Limit its internal queue somehow?

I'm using Python2.7. Cheers.

604

asked Jun 24 '12 00:06

user124114

1 Answers

As you can see by looking into the corresponding source file (python2.7/multiprocessing/pool.py), the IMapUnorderedIterator uses a collections.deque instance for storing the results. If a new item comes in, it is added and removed in the iteration.

As you suggested, if another huge object comes in while the main thread is still processing the object, those will be stored in memory too.

What you might try is something like this:

it = pool.imap_unordered(fnc, some_list)
for result in it:
    it._cond.acquire()
    for x in result:
        yield x
    it._cond.release()

This should cause the task-result-receiver-thread to get blocked while you process an item if it is trying to put the next object into the deque. Thus there should not be more than two of the huge objects in memory. If that works for your case, I don't know ;)

answered Sep 28 '22 01:09

rumpel

Related questions
                            
                                Open Visual Studio project on MAC OS X
                            
                                Common View Controller Base Class
                            
                                Create instance of class in another class (with generic example)
                            
                                Let's solve the "Failed to find style 'mapViewstyle' in current theme" error
                            
                                Different behavior of Calendar class in Java and Android
                            
                                How to include a certain Qt installation using CMake?
                            
                                What are good examples of REST API client libraries in JavaScript [closed]
                            
                                The mystery of the missing commit across merges
                            
                                Get both sim numbers in a dual sim android phone
                            
                                Tap continue and sign in to check for downloads. [Sandbox]
                            
                                Exporting STL class from DLL - why is there no warning from the return type?
                            
                                I18n strategies for Go with App Engine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With