I am trying to write a convenience function based on the multiprocessing
library, that takes any function and argument, and runs that function using multiple processes. I have the following file "MultiProcFunctions.py" that I am importing:
import multiprocessing
from multiprocessing import Manager
def MultiProcDecorator(f,*args):
"""
Takes a function f, and formats it so that results are saved to a shared dict
"""
def g(procnum,return_dict,*args):
result = f(*args)
return_dict[procnum] = result
return g
def MultiProcFunction(f,n_procs,*args):
"""
Takes a function f, and runs it in n_procs with given args
"""
manager = Manager()
return_dict = manager.dict()
jobs = []
for i in range(n_procs):
p = multiprocessing.Process( target = f, args = (i,return_dict) + args )
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
return dict(return_dict)
Here is the code I run:
from MultiProcFunctions import *
def sq(x):
return [i**2 for i in x]
g = MultiProcDecorator(sq)
if __name__ == '__main__':
result = MultiProcFunction(g,2,[1,2,3])
I get the following error: PicklingError: Can't pickle <function g at 0x01BD83B0>: it's not found as MultiProcFunctions.g
If I use the following definition for g
instead, everything is fine:
def g(procnum,return_dict,x):
result = [i**2 for i in x]
return_dict[procnum] = result
Why is are the two definitions of g
different, and is there any thing I can do to get the original g
definition to "work"?
Trying dano's trick seem to only works in Python 2. When trying in Python 3, I get the following error:
pickle.PicklingError: Can't pickle <function serialize at 0x7f7a1ac1fd08>: it's not the same object as __main__.orig_fn
I solved this issue by "decorating" function from worker's init:
from functools import wraps
import sys
def worker_init(fn, *args):
@wraps(fn)
def wrapper(x):
# wrapper logic
pass
setattr(sys.modules[fn.__module__], fn.__name__, wrapper)
pool = mp.Pool(initializer=worker_init, initargs=[orig_fn, *args])
# ...
This is happening because g
is actually defined as a nested function in MultiProcFunctions
, which means it's not actually importable from the top-level of that module, which means it won't pickle properly. Now, we're actually pretty clearly defining g
in the top-level of __main__
module though, when we do this:
g = MultiProcDecorator(sq)
So, it really should be picklable. We can make it work by explicitly setting the __module__
of g
to be "__main__"
:
g = MultiProcDecorator(sq)
g.__module__ = "__main__" # Fix the __module__
This will allow the pickling process to work, since it will look for the definition of g
in __main__
, where it is defined at the top-level, rather than MultiProcFunctions
, where it is only defined in a nested scope.
Edit:
Note that you could also make the change in the decorator itself:
def MultiProcDecorator(f,*args):
"""
Takes a function f, and formats it so that results are saved to a shared dict
"""
def g(procnum,return_dict,*args):
result = f(*args)
return_dict[procnum] = result
g.__module__ = "__main__"
return g
This probably makes more sense for you, since this decorator is strictly meant to be using for multiprocessing
purposes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With