Hi I've been struggling with this for the better part of the morning and was hoping someone could point me in the right direction.
This is the code I have at the moment:
def f(tup):
return some_complex_function(*tup)
def main():
pool = Pool(processes=4)
#import and process data omitted
_args = [(x.some_func1, .05, x.some_func2) for x in list_of_some_class]
results = pool.map(f, _args)
print results
The first error I get is:
> Exception in thread Thread-2: Traceback (most recent call last):
> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
> self.run() File "/usr/lib/python2.7/threading.py", line 504, in run
> self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
> _handle_tasks
> put(task) PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
Any help would be very appreciated.
It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. — multiprocessing — Process-based parallelism. The built-in map() function allows you to apply a function to each item in an iterable. The Python process pool provides a parallel version of the map() function.
The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.
Passing Keyword Arguments to Multiprocessing Processes We can also pass in arguments corresponding to the parameter name using the kwargs parameter in the Process class. Instead of passing a tuple, we pass a dictionary to kwargs where we specify the argument name and the variable being passed in as that argument.
The multiprocess
module uses the pickle
module to serialize the arguments passed to the function (f
), which is executed in another process.
Many of the built-in types can be pickled, but instance methods cannot be pickled. So .05
is fine, but x.some_func1
isn't. See What can be pickled and unpickled? for more details.
There's no simple solution. You'll need to restructure your program so instance methods don't need to be passed as arguments (or avoid using multiprocess
).
If you use a fork of multiprocessing
called pathos.multiprocesssing
, you can directly use classes and class methods in multiprocessing's map
functions. This is because dill
is used instead of pickle
or cPickle
, and dill
can serialize almost anything in python.
pathos.multiprocessing
also provides an asynchronous map function… and it can map
functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6])
)
See: What can multiprocessing and dill do together?
and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>>
>>> p = Pool(4)
>>>
>>> def add(x,y):
... return x+y
...
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>>
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>>
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>>
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>>
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
Get the code here: https://github.com/uqfoundation/pathos
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With