Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large objects and `multiprocessing` pipes and `send()`

I've recently found out that, if we create a pair of parent-child connection objects by using multiprocessing.Pipe, and if an object obj we're trying to send through the pipe is too large, my program hangs without throwing exception or doing anything at all. See code below. (The code below uses the numpy package to produce a large array of floats.)

import multiprocessing as mp
import numpy as np

def big_array(conn, size=1200):
    a = np.random.rand(size)
    print "Child process trying to send array of %d floats." %size
    conn.send(a)
    return a

if __name__ == "__main__":
    print "Main process started."
    parent_conn, child_conn = mp.Pipe()
    proc = mp.Process(target=big_array, args=[child_conn, 1200])
    proc.start()
    print "Child process started."
    proc.join()
    print "Child process joined."
    a = parent_conn.recv()
    print "Received the following object."
    print "Type: %s. Size: %d." %(type(a), len(a))

The output is the following.

Main process started.
Child process started.
Child process trying to send array of 1200 floats.

And it hangs here indefinitely. However, if instead of 1200, we try to send an array with 1000 floats, then the program executes successfully, with the following output as expected.

Main process started.
Child process started.
Child process trying to send array of 1000 floats.
Child process joined.
Received the following object.
Type: <type 'numpy.ndarray'>. Size: 1000.
Press any key to continue . . .

This looks like a bug to me. The documentation says the following.

send(obj) Send an object to the other end of the connection which should be read using recv().

The object must be picklable. Very large pickles (approximately 32 MB+, though it depends on the OS) may raise a ValueError exception.

But with my run, not even a ValueError exception was thrown, the program just hangs there. Moreover, the 1200-long numpy array is 9600 bytes big, certainly not more than 32MB! This looks like a bug. Does anyone know how to solve this problem?

By the way, I'm using Windows 7, 64-bit.

like image 447
Ray Avatar asked Feb 28 '13 13:02

Ray


1 Answers

Try to move join() below recv():

import multiprocessing as mp

def big_array(conn, size=1200):
    a = "a" * size
    print "Child process trying to send array of %d floats." %size
    conn.send(a)
    return a

if __name__ == "__main__":
    print "Main process started."
    parent_conn, child_conn = mp.Pipe()
    proc = mp.Process(target=big_array, args=[child_conn, 120000])
    proc.start()
    print "Child process started."
    print "Child process joined."
    a = parent_conn.recv()
    proc.join()
    print "Received the following object."
    print "Type: %s. Size: %d." %(type(a), len(a))

But I don't really understand why your example works even for small sizes. I was thinking that writing to pipe and then making the process to join without first reading the data from pipe will block the join. You should first receive from pipe, then join. But apparently it does not block for small sizes...?

Edit: from the docs (http://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming):

"An example which will deadlock is the following:"

from multiprocessing import Process, Queue

def f(q):
q.put('X' * 1000000)

if __name__ == '__main__':
    queue = Queue()
    p = Process(target=f, args=(queue,))
    p.start()
    p.join()                    # this deadlocks
    obj = queue.get()
like image 175
Ecir Hana Avatar answered Dec 09 '22 15:12

Ecir Hana