Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Telling python multiprocessing which pickle protocol should be used for serialization [duplicate]

How can I change the serialization method used by the Python multiprocessing library? In particular, the default serialization method uses the pickle library with the default pickle protocol version for that version of Python. The default pickle protocol is version 2 in Python 2.7 and version 3 in Python 3.6. How can I set the protocol version to 2 in Python 3.6, so I can use some of the classes (like Client and Listener) in the multiprocessing library to communicate between a server processing run by Python 2.7 and a client process run by Python 3.6?

(Side note: as a test, I modified line 206 of multiprocessing/connection.py by adding protocol=2 to the dump() call to force the protocol version to 2 and my client/server processes worked in my limited testing with the server run by 2.7 and the client by 3.6).

In Python 3.6, a patch was merged to let the serializer be set, but the patch was undocumented, and I haven't figured out how to use it. Here is how I tried to use it (I posted this also to the Python ticket that I linked to):

pickle2reducer.py:

from multiprocessing.reduction import ForkingPickler, AbstractReducer

class ForkingPickler2(ForkingPickler):
    def __init__(self, *args):
        if len(args) > 1:
            args[1] = 2
        else:
            args.append(2)
        super().__init__(*args)

    @classmethod
    def dumps(cls, obj, protocol=2):
        return ForkingPickler.dumps(obj, protocol)


def dump(obj, file, protocol=2):
    ForkingPickler2(file, protocol).dump(obj)


class Pickle2Reducer(AbstractReducer):
    ForkingPickler = ForkingPickler2
    register = ForkingPickler2.register
    dump = dump

and in my client:

import pickle2reducer
multiprocessing.reducer = pickle2reducer.Pickle2Reducer()

at the top before doing anything else with multiprocessing. I still see ValueError: unsupported pickle protocol: 3 on the server run by Python 2.7 when I do this.

like image 880
ws_e_c421 Avatar asked Jul 15 '17 14:07

ws_e_c421


People also ask

What is pickle in multiprocessing?

Pickling or Serialization transforms from object state into a series of bits — the object could be methods, data, class, API end-points, etc. Serialization is an effective way to share big objects easily without losing information. The original object could be retrieved through the object Deserialization process.

What is the use of pickle in Python?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

What can I use instead of a pickle in Python?

An alternative is cPickle. It is nearly identical to pickle , but written in C, which makes it up to 1000 times faster. For small files, however, you won't notice the difference in speed. Both produce the same data streams, which means that Pickle and cPickle can use the same files.

What is Serialization and pickle in Python?

Serialization is the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network. Deserialization is the process of converting a byte stream to Python object. This process is also called pickling/unpickling or marshalling/unmarshalling.


2 Answers

I believe the patch you're referring to works if you're using a multiprocessing "context" object.

Using your pickle2reducer.py, your client should start with:

import pickle2reducer
import multiprocessing as mp

ctx = mp.get_context()
ctx.reducer = pickle2reducer.Pickle2Reducer()

And ctx has the same API as multiprocessing.

Hope that helps!

like image 65
Daniel Liu Avatar answered Oct 16 '22 17:10

Daniel Liu


Thanks so much for this. It led me exactly to the solution I needed. I ended up doing something similar but by modifying the Connection class. It felt cleaner to me than making my own full subclass and replacing that.

from multiprocessing.connection import Connection, _ForkingPickler, Client, Listener

def send_py2(self, obj):
    self._check_closed()
    self._check_writable()
    self._send_bytes(_ForkingPickler.dumps(obj, protocol=2))

Connection.send = send_py2

This is just exactly the code from multiprocessing.connection with only the protocol=2 argument added.

I suppose you could even do the same thing by directly editing the original ForkingPickler class inside of multiprocessing.reduction.

like image 27
Jase Lindgren Avatar answered Oct 16 '22 16:10

Jase Lindgren