How can I change the serialization method used by the Python multiprocessing
library? In particular, the default serialization method uses the pickle
library with the default pickle protocol version for that version of Python. The default pickle protocol is version 2 in Python 2.7 and version 3 in Python 3.6. How can I set the protocol version to 2 in Python 3.6, so I can use some of the classes (like Client
and Listener
) in the multiprocessing
library to communicate between a server processing run by Python 2.7 and a client process run by Python 3.6?
(Side note: as a test, I modified line 206 of multiprocessing/connection.py
by adding protocol=2
to the dump()
call to force the protocol version to 2 and my client/server processes worked in my limited testing with the server run by 2.7 and the client by 3.6).
In Python 3.6, a patch was merged to let the serializer be set, but the patch was undocumented, and I haven't figured out how to use it. Here is how I tried to use it (I posted this also to the Python ticket that I linked to):
pickle2reducer.py:
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler2(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
@classmethod
def dumps(cls, obj, protocol=2):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=2):
ForkingPickler2(file, protocol).dump(obj)
class Pickle2Reducer(AbstractReducer):
ForkingPickler = ForkingPickler2
register = ForkingPickler2.register
dump = dump
and in my client:
import pickle2reducer
multiprocessing.reducer = pickle2reducer.Pickle2Reducer()
at the top before doing anything else with multiprocessing
. I still see ValueError: unsupported pickle protocol: 3
on the server run by Python 2.7 when I do this.
Python provides the ability to create and manage new processes via the multiprocessing. Process class. In multiprocessing programming, we may need to change the technique used to start child processes. This is called the start method.
However, the multiprocess tasks can't be pickled; it would raise an error failing to pickle. That's because when dividing a single task over multiprocess, these might need to share data; however, it doesn't share memory space.
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
A process can be killed by calling the Process. kill() function. The call will only terminate the target process, not child processes. The method is called on the multiprocessing.
I believe the patch you're referring to works if you're using a multiprocessing "context" object.
Using your pickle2reducer.py, your client should start with:
import pickle2reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle2reducer.Pickle2Reducer()
And ctx
has the same API as multiprocessing
.
Hope that helps!
Thanks so much for this. It led me exactly to the solution I needed. I ended up doing something similar but by modifying the Connection class. It felt cleaner to me than making my own full subclass and replacing that.
from multiprocessing.connection import Connection, _ForkingPickler, Client, Listener
def send_py2(self, obj):
self._check_closed()
self._check_writable()
self._send_bytes(_ForkingPickler.dumps(obj, protocol=2))
Connection.send = send_py2
This is just exactly the code from multiprocessing.connection with only the protocol=2
argument added.
I suppose you could even do the same thing by directly editing the original ForkingPickler
class inside of multiprocessing.reduction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With