I need to execute the code below (simplified version of my real code base in Python 3.5):
import multiprocessing
def forever(do_something=None):
while True:
do_something()
p = multiprocessing.Process(target=forever, args=(lambda: print("do something"),))
p.start()
In order to create the new process Python need to pickle the function and the lambda passed as target. Unofrtunately pickle cannot serialize lambdas and the output is like this:
_pickle.PicklingError: Can't pickle <function <lambda> at 0x00C0D4B0>: attribute lookup <lambda> on __main__ failed
I discoverd cloudpickle which can serialize and deserialize lambdas and closures, using the same interface of pickle.
How can I force the Python multiprocessing module to use cloudpickle instead of pickle?
Clearly hacking the code of the standard lib multiprocessing is not an option!
Thanks
Charlie
However, the multiprocess tasks can't be pickled; it would raise an error failing to pickle. That's because when dividing a single task over multiprocess, these might need to share data; however, it doesn't share memory space.
multiprocessing has been distributed as part of the standard library since python 2.6.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
dill can be used to store python objects to a file, but the primary usage is to send python objects across the network as a byte stream. dill is quite flexible, and allows arbitrary user defined classes and functions to be serialized.
Try multiprocess
. It's a fork of multiprocessing
that uses the dill
serializer instead of pickle
-- there are no other changes in the fork.
I'm the author. I encountered the same problem as you several years ago, and ultimately I decided that that hacking the standard library was my only choice, as some of the pickle
code in multiprocessing
is in C++.
>>> import multiprocess as mp
>>> p = mp.Pool()
>>> p.map(lambda x:x**2, range(4))
[0, 1, 4, 9]
>>>
If you're willing to do a little monkeypatching, a quick fix is to sub out the pickle.Pickler
:
import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler
or, in more recent versions of Python where _pickle.Pickle
is pulled in,
from multiprocessing import reduction
import cloudpickle
reduction.ForkingPickler = cloudpickle.Pickler
Just make sure to do this before importing multiprocessing
. Here's a full example:
import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler
import multiprocessing as mp
mp.set_start_method('spawn', True)
def procprint(f):
print(f())
if __name__ == '__main__':
p = mp.Process(target=procprint, args=(lambda: "hello",))
p.start()
p.join()
As an aside, you won't need to do any of this if your start method is fork
, since with forking nothing needs to be pickled in the first place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With