Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace pickle in Python multiprocessing lib

I need to execute the code below (simplified version of my real code base in Python 3.5):

import multiprocessing
def forever(do_something=None):
    while True:
        do_something()

p = multiprocessing.Process(target=forever, args=(lambda: print("do  something"),))
p.start()

In order to create the new process Python need to pickle the function and the lambda passed as target. Unofrtunately pickle cannot serialize lambdas and the output is like this:

_pickle.PicklingError: Can't pickle <function <lambda> at 0x00C0D4B0>: attribute lookup <lambda> on __main__ failed

I discoverd cloudpickle which can serialize and deserialize lambdas and closures, using the same interface of pickle.

How can I force the Python multiprocessing module to use cloudpickle instead of pickle?

Clearly hacking the code of the standard lib multiprocessing is not an option!

Thanks

Charlie

like image 343
Charlie Avatar asked Oct 25 '16 08:10

Charlie


People also ask

Does multiprocessing use pickle?

However, the multiprocess tasks can't be pickled; it would raise an error failing to pickle. That's because when dividing a single task over multiprocess, these might need to share data; however, it doesn't share memory space.

Is multiprocessing a standard Python library?

multiprocessing has been distributed as part of the standard library since python 2.6.

What is multiprocessing library in Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

What is Dill module in Python?

dill can be used to store python objects to a file, but the primary usage is to send python objects across the network as a byte stream. dill is quite flexible, and allows arbitrary user defined classes and functions to be serialized.


2 Answers

Try multiprocess. It's a fork of multiprocessing that uses the dill serializer instead of pickle -- there are no other changes in the fork.

I'm the author. I encountered the same problem as you several years ago, and ultimately I decided that that hacking the standard library was my only choice, as some of the pickle code in multiprocessing is in C++.

>>> import multiprocess as mp
>>> p = mp.Pool()
>>> p.map(lambda x:x**2, range(4))
[0, 1, 4, 9]
>>> 
like image 173
Mike McKerns Avatar answered Oct 05 '22 16:10

Mike McKerns


If you're willing to do a little monkeypatching, a quick fix is to sub out the pickle.Pickler:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

or, in more recent versions of Python where _pickle.Pickle is pulled in,

from multiprocessing import reduction
import cloudpickle
reduction.ForkingPickler = cloudpickle.Pickler

Just make sure to do this before importing multiprocessing. Here's a full example:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

import multiprocessing as mp
mp.set_start_method('spawn', True)

def procprint(f):
    print(f())

if __name__ == '__main__':
    p = mp.Process(target=procprint, args=(lambda: "hello",))
    p.start()
    p.join()

As an aside, you won't need to do any of this if your start method is fork, since with forking nothing needs to be pickled in the first place.

like image 23
Andy Jones Avatar answered Oct 05 '22 15:10

Andy Jones