Python multiprocessing for parallel processes

Tags:

I'm sorry if this is too simple for some people, but I still don't get the trick with python's multiprocessing. I've read
http://docs.python.org/dev/library/multiprocessing
http://pymotw.com/2/multiprocessing/basics.html and many other tutorials and examples that google gives me... many of them from here too.

Well, my situation is that I have to compute many numpy matrices and I need to store them in a single numpy matrix afterwards. Let's say I want to use 20 cores (or that I can use 20 cores) but I haven't managed to successfully use the pool resource since it keeps the processes alive till the pool "dies". So I thought on doing something like this:

from multiprocessing import Process, Queue  
import numpy as np  

def f(q,i):  
     q.put( np.zeros( (4,4) ) ) 

if __name__ == '__main__':   
     q = Queue()   
     for i in range(30):   
          p = Process(target=f, args=(q,))  
          p.start()  
          p.join()  
     result = q.get()  
     while q.empty() == False:
          result += q.get()  
     print result

but then it looks like the processes don't run in parallel but they run sequentially (please correct me if I'm wrong) and I don't know if they die after they do their computation (so for more than 20 processes the ones that did their part leave the core free for another process). Plus, for a very large number (let's say 100.000), storing all those matrices (which may be really big too) in a queue will use a lot of memory, rendering the code useless since the idea is to put every result on each iteration in the final result, like using a lock (and its acquire() and release() methods), but if this code isn't for parallel processing, the lock is useless too...

I hope somebody may help me.

Thanks in advance!

628

asked Jan 06 '12 04:01

Carlos

1 Answers

You are correct, they are executing sequentially in your example.

p.join() causes the current thread to block until it is finished executing. You'll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool and apply_async with a callback. That will also let you add it to your results directly rather than keeping the objects around.

For example:

def f(i):  
    return i*np.identity(4)

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))
    def adder(value):
        global result
        result += value

    for i in range(30):
        p.apply_async(f, args=(i,), callback=adder)
    p.close()
    p.join()
    print result

Closing and then joining the pool at the end ensures that the pool's processes have completed and the result object is finished being computed. You could also investigate using Pool.imap as a solution to your problem. That particular solution would look something like this:

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))

    im = p.imap_unordered(f, range(30), chunksize=5)

    for x in im:
        result += x

    print result

This is cleaner for your specific situation, but may not be for whatever you are ultimately trying to do.

As to storing all of your varied results, if I understand your question, you can just add it off into a result in the callback method (as above) or item-at-a-time using imap/imap_unordered (which still stores the results, but you'll clear it as it builds). Then it doesn't need to be stored for longer than it takes to add to the result.

answered Oct 06 '22 13:10

David H. Clements

Related questions
                            
                                Write multiple numpy arrays to file
                            
                                How can I dump raw XML of my request and server's response using suds in python
                            
                                How do I change the choices in a Django model?
                            
                                Python monitoring stderr and stdout of a subprocess
                            
                                Search of Dictionary Keys python
                            
                                how to detect quickly if a string is zlib compressed?
                            
                                SQLAlchemy: getter/setter in declarative Mixin class
                            
                                How does the "magic lines(s)" in python work, when specifying encoding in python file?
                            
                                Web sockets / Tornado - Notify client on database update
                            
                                E-commerce from scratch or not
                            
                                Understanding resource and context in Pyramid
                            
                                Mixing python with a faster language for optimization in GAE
                            
                                Can i use all standard Python libraries with IronPython or
                            
                                Django ImportError
                            
                                Use if __name__ == '__main__': for tests
                            
                                Downloaded filename with Google App Engine Blobstore
                            
                                Python: Clicking a button with urllib or urllib2
                            
                                How to migrate my app.yaml to 2.7?
                            
                                Paramiko: Piping blocks forever on read
                            
                                resizing a dialog with PyQt4

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing for parallel processes

Tags:

python

multiprocessing

Carlos

People also ask

1 Answers

David H. Clements

Recent Activity

Donate For Us