Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing a function that accepts class member functions as variables into python multiprocess pool.map()

Hi I've been struggling with this for the better part of the morning and was hoping someone could point me in the right direction.

This is the code I have at the moment:

def f(tup):
    return some_complex_function(*tup)

def main():

    pool = Pool(processes=4) 
    #import and process data omitted 
    _args = [(x.some_func1, .05, x.some_func2) for x in list_of_some_class]
    results = pool.map(f, _args)
    print results

The first error I get is:

> Exception in thread Thread-2: Traceback (most recent call last):  
> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>     self.run()   File "/usr/lib/python2.7/threading.py", line 504, in run
>     self.__target(*self.__args, **self.__kwargs)   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
> _handle_tasks
>     put(task) PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Any help would be very appreciated.

like image 352
ast4 Avatar asked Jan 22 '13 17:01

ast4


People also ask

What is Pool map in Python?

It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. — multiprocessing — Process-based parallelism. The built-in map() function allows you to apply a function to each item in an iterable. The Python process pool provides a parallel version of the map() function.

What is pool in multiprocessing Python?

The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.

How do you pass arguments in multiprocessing in Python?

Passing Keyword Arguments to Multiprocessing Processes We can also pass in arguments corresponding to the parameter name using the kwargs parameter in the Process class. Instead of passing a tuple, we pass a dictionary to kwargs where we specify the argument name and the variable being passed in as that argument.


2 Answers

The multiprocess module uses the pickle module to serialize the arguments passed to the function (f), which is executed in another process.

Many of the built-in types can be pickled, but instance methods cannot be pickled. So .05 is fine, but x.some_func1 isn't. See What can be pickled and unpickled? for more details.

There's no simple solution. You'll need to restructure your program so instance methods don't need to be passed as arguments (or avoid using multiprocess).

like image 188
Jon-Eric Avatar answered Sep 29 '22 06:09

Jon-Eric


If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.

pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))

See: What can multiprocessing and dill do together?

and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]

Get the code here: https://github.com/uqfoundation/pathos

like image 35
Mike McKerns Avatar answered Sep 29 '22 06:09

Mike McKerns