Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I avoid this pickling error, and what is the best way to parallelize this code in Python?

I have the following code.

def main():
  (minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
  for i in range(minI, maxI, iStep):
    for j in range(minJ, maxJ, jStep): 
      p = multiprocessing.Process(target=functionA, args=(minI, minJ))
      p.start()
      def functionB((a, b)):
        subprocess.call('program1 %s %s %s %s %s %s' %(c, a, b, 'file1', 
          'file2', 'file3'), shell=True)
        for d in ['a', 'b', 'c']:
          subprocess.call('program2 %s %s %s %s %s' %(d, 'file4', 'file5', 
            'file6', 'file7'), shell=True)
      abProduct = list(itertools.product(range(0, 10), range(0, 10)))
      pool = multiprocessing.Pool(processes=numProcessors)
      pool.map(functionB, abProduct) 

It produces the following error.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run 
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 255, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function fa
iled

The contents of functionA are unimportant, and do not produce an error. The error seems to occur when I try to map functionB. How do I remove this error, and what is the best way to parallelize this code in Python 2.6?

like image 697
idealistikz Avatar asked Jul 02 '12 03:07

idealistikz


1 Answers

The reason you are most likely seeing this behavior is because of the order in which you define your pool, objects, and functions. multiprocessing is not quite the same as using threads. Each process will spawn and load a copy of the environment. If you create functions in scopes that may not be available to the processes, or create objects before the pool, then the pool will fail.

First, try creating one pool before your big loop:

(minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
pool = multiprocessing.Pool(processes=numProcessors)
for i in range(minI, maxI, iStep):
    ...

Then, move your target callable outside the dynamic loop:

def functionB(a, b):
    ...

def main():
    ...

Consider this example...

broken

import multiprocessing

def broken():
    vals = [1,2,3]

    def test(x):
        return x

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

broken()
# PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

working

import multiprocessing

def test(x):
    return x

def working():
    vals = [1,2,3]

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

working()
# [1, 2, 3]
like image 147
jdi Avatar answered Sep 17 '22 21:09

jdi