Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pickle and unpickle to portable string in Python 3

I need to pickle a Python3 object to a string which I want to unpickle from an environmental variable in a Travis CI build. The problem is that I can't seem to find a way to pickle to a portable string (unicode) in Python3:

import os, pickle      from my_module import MyPickleableClass   obj = {'cls': MyPickleableClass, 'other_stuf': '(...)'}  pickled = pickle.dumps(obj)  # raises TypeError: str expected, not bytes os.environ['pickled'] = pickled  # raises UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb (...) os.environ['pickled'] = pickled.decode('utf-8')  pickle.loads(os.environ['pickled']) 

Is there a way to serialize complex objects like datetime.datetime to unicode or to some other string representation in Python3 which I can transfer to a different machine and deserialize?

Update

I have tested the solutions suggested by @kindall, but the pickle.dumps(obj, 0).decode() raises a UnicodeDecodeError. Nevertheless the base64 approach works but it needed an extra decode/encode step. The solution works on both Python2.x and Python3.x.

# encode returns bytes so it needs to be decoded to string pickled = pickle.loads(codecs.decode(pickled.encode(), 'base64')).decode()  type(pickled)  # <class 'str'>  unpickled = pickle.loads(codecs.decode(pickled.encode(), 'base64')) 
like image 669
Peter Hudec Avatar asked May 26 '15 21:05

Peter Hudec


People also ask

How do you Unpickle a pickle in Python?

As we said earlier, the load() method can be used to unpickle the pickled Python object. You have to first open the pickled file using rb (read-binary) permission and pass the opened file to the load() method, as shown below. The load() method unpickles the data and returns the actual object.

Are Python pickles portable?

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored. The marshal serialization format is not guaranteed to be portable across Python versions.

Can you pickle a string Python?

Pickles in Python are tasty in the sense that they represent a Python object as a string of bytes. Many things can actually be done with those bytes. For instance, you can store them in a file or database, or transfer them over a network. The pickled representation of a Python object is called a pickle file.

Is pickling and Unpickling in Python?

Pickle is a module in Python that is primarily used to serialize and de-serialize a Python object structure. Both pickling and unpickling become essential when we have to transfer Python objects from one system to another. Pickling is a process by which the object structure in Python is serialized.


2 Answers

pickle.dumps() produces a bytes object. Expecting these arbitrary bytes to be valid UTF-8 text (the assumption you are making by trying to decode it to a string from UTF-8) is pretty optimistic. It'd be a coincidence if it worked!

One solution is to use the older pickling protocol that uses entirely ASCII characters. This still comes out as bytes, but since it is ASCII-only it can be decoded to a string without stress:

pickled = pickle.dumps(obj, 0).decode() 

You could also use some other encoding method to encode a binary-pickled object to text, such as base64:

import codecs pickled = codecs.encode(pickle.dumps(obj), "base64").decode() 

Decoding would then be:

unpickled = pickle.loads(codecs.decode(pickled.encode(), "base64")) 

Using pickle with protocol 0 seems to result in shorter strings than base64-encoding binary pickles (and abarnert's suggestion of hex-encoding is going to be even larger than base64), but I haven't tested it rigorously or anything. Test it with your data and see.

like image 195
kindall Avatar answered Oct 07 '22 21:10

kindall


If you want to store bytes in the environment, instead of encoded text, that's what environb is for.

This doesn't work on Windows. (As the docs imply, you should check os.supports_bytes_environ if you're on 3.2+ instead of just assuming that Unix does and Windows doesn't…) So for that, you'll need to smuggle the bytes into something that can be encoded no matter what your system encoding is, e.g., using backslash-escape, or even hex. So, for example:

if os.supports_bytes_environ:     environb['pickled'] = pickled else:     environ['pickled'] = codecs.encode(pickled, 'hex') 
like image 36
abarnert Avatar answered Oct 07 '22 23:10

abarnert