Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiprocessing with a decorated function results in a PicklingError

I am trying to write a convenience function based on the multiprocessing library, that takes any function and argument, and runs that function using multiple processes. I have the following file "MultiProcFunctions.py" that I am importing:

import multiprocessing
from multiprocessing import Manager

def MultiProcDecorator(f,*args):

    """
    Takes a function f, and formats it so that results are saved to a shared dict
    """

    def g(procnum,return_dict,*args):
        result = f(*args)
        return_dict[procnum] = result

    return g

def MultiProcFunction(f,n_procs,*args):
    """
    Takes a function f, and runs it in n_procs with given args
    """

    manager     = Manager()
    return_dict = manager.dict()

    jobs = []
    for i in range(n_procs):
        p = multiprocessing.Process( target = f, args = (i,return_dict) + args )
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()

    return dict(return_dict)

Here is the code I run:

from MultiProcFunctions import *

def sq(x):
    return [i**2 for i in x]

g = MultiProcDecorator(sq)

if __name__ == '__main__':

    result = MultiProcFunction(g,2,[1,2,3])

I get the following error: PicklingError: Can't pickle <function g at 0x01BD83B0>: it's not found as MultiProcFunctions.g

If I use the following definition for g instead, everything is fine:

def g(procnum,return_dict,x):
    result = [i**2 for i in x]
    return_dict[procnum] = result

Why is are the two definitions of g different, and is there any thing I can do to get the original g definition to "work"?

like image 758
killajoule Avatar asked Oct 26 '14 18:10

killajoule


2 Answers

Trying dano's trick seem to only works in Python 2. When trying in Python 3, I get the following error:

pickle.PicklingError: Can't pickle <function serialize at 0x7f7a1ac1fd08>: it's not the same object as __main__.orig_fn

I solved this issue by "decorating" function from worker's init:

from functools import wraps
import sys

def worker_init(fn, *args):
    @wraps(fn)
    def wrapper(x):
        # wrapper logic
        pass

    setattr(sys.modules[fn.__module__], fn.__name__, wrapper)

pool = mp.Pool(initializer=worker_init, initargs=[orig_fn, *args])
# ...
like image 87
Slartibartfast Avatar answered Oct 17 '22 20:10

Slartibartfast


This is happening because g is actually defined as a nested function in MultiProcFunctions, which means it's not actually importable from the top-level of that module, which means it won't pickle properly. Now, we're actually pretty clearly defining g in the top-level of __main__ module though, when we do this:

g = MultiProcDecorator(sq)

So, it really should be picklable. We can make it work by explicitly setting the __module__ of g to be "__main__":

g = MultiProcDecorator(sq)
g.__module__ = "__main__"  # Fix the __module__

This will allow the pickling process to work, since it will look for the definition of g in __main__, where it is defined at the top-level, rather than MultiProcFunctions, where it is only defined in a nested scope.

Edit:

Note that you could also make the change in the decorator itself:

def MultiProcDecorator(f,*args):

    """
    Takes a function f, and formats it so that results are saved to a shared dict
    """

    def g(procnum,return_dict,*args):
        result = f(*args)
        return_dict[procnum] = result
    g.__module__ = "__main__"

    return g

This probably makes more sense for you, since this decorator is strictly meant to be using for multiprocessing purposes.

like image 40
dano Avatar answered Oct 17 '22 22:10

dano