Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing IOError: bad message length

I get an IOError: bad message length when passing large arguments to the map function. How can I avoid this? The error occurs when I set N=1500 or bigger.

The code is:

import numpy as np
import multiprocessing

def func(args):
    i=args[0]
    images=args[1]
    print i
    return 0

N=1500       #N=1000 works fine

images=[]
for i in np.arange(N):
    images.append(np.random.random_integers(1,100,size=(500,500)))

iter_args=[]
for i in range(0,1):
    iter_args.append([i,images])

pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)

In the docs of multiprocessing there is the function recv_bytes that raises an IOError. Could it be because of this? (https://python.readthedocs.org/en/v2.7.2/library/multiprocessing.html)

EDIT If I use images as a numpy array instead of a list, I get a different error: SystemError: NULL result without error in PyObject_Call. A bit different code:

import numpy as np
import multiprocessing

def func(args):
    i=args[0]
    images=args[1]
    print i
    return 0

N=1500       #N=1000 works fine

images=[]
for i in np.arange(N):
    images.append(np.random.random_integers(1,100,size=(500,500)))
images=np.array(images)                                            #new

iter_args=[]
for i in range(0,1):
    iter_args.append([i,images])

pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)

EDIT2 The actual function that I use is:

def func(args):
    i=args[0]
    images=args[1]
    image=np.mean(images,axis=0)
    np.savetxt("image%d.txt"%(i),image)
    return 0

Additionally, the iter_args do not contain the same set of images:

iter_args=[]
for i in range(0,1):
    rand_ind=np.random.random_integers(0,N-1,N)
    iter_args.append([i,images[rand_ind]])
like image 903
Andy Avatar asked Jun 14 '15 20:06

Andy


1 Answers

You're creating a pool and sending all the images at once to func(). If you can get away with working on a single image at once, try something like this, which runs to completion with N=10000 in 35s with Python 2.7.10 for me:

import numpy as np
import multiprocessing

def func(args):
    i = args[0]
    img = args[1]
    print "{}: {} {}".format(i, img.shape, img.sum())
    return 0

N=10000

images = ((i, np.random.random_integers(1,100,size=(500,500))) for i in xrange(N))
pool=multiprocessing.Pool(4)
pool.imap(func, images)
pool.close()
pool.join()

The key here is to use iterators so you don't have to hold all the data in memory at once. For instance I converted images from an array holding all the data to a generator expression to create the image only when needed. You could modify this to load your images from disk or whatever. I also used pool.imap instead of pool.map.

If you can, try to load the image data in the worker function. Right now you have to serialize all the data and ship it across to another process. If your image data is larger, this might be a bottleneck.

[update now that we know func has to handle all images at once]

You could do an iterative mean on your images. Here's a solution without using multiprocessing. To use multiprocessing, you could divide your images into chunks, and farm those chunks out to the pool.

import numpy as np

N=10000
shape = (500,500)

def func(images):
    average = np.full(shape, 0)
    for i, img in images:
        average += img / N
    return average

images = ((i, np.full(shape,i)) for i in range(N))

print func(images)
like image 110
Joseph Sheedy Avatar answered Oct 21 '22 07:10

Joseph Sheedy