Given a pickle dump in python how to I determine the used protocol?

Tags:

pickle

Assume that I have a pickle dump - either as a file or just as a string - how can I determine the protocol that was used to create the pickle dump automatically?

And if so, do I need to read the entire dump to figure out the protocol or can this be achieved in O(1)? By O(1) I think about some header information at the beginning of the pickle string or file whose read out does not require processing the whole dump.

Thanks a lot!

EDIT: I have an update on this, apparently the answer given below does not always work under python 3.4. If I simply pickle the value True with protocol 1, sometimes I can only recover protocol 0 :-/

901

asked Nov 06 '13 09:11

SmCaterpillar

2 Answers

You could roll your own using picketools:

with open('your_pickle_file', 'rb') as fin:
    op, fst, snd = next(pickletools.genops(fin))
    proto = op.proto

It appears that a PROTO marker is only written as the first element where the protocol is 2 or greater. Otherwise, the first element is a marker or element that indicates if the protocol is 0 or 1.

Update into kludging even more land:

pops = pickletools.genops(pickle_source)
proto = 2 if next(pops)[0].proto == 2 else int(any(op.proto for op, fst, snd in pops))

134

answered Sep 27 '22 18:09

Jon Clements

2020 update:

I tried the methods here (from @JonClements's answer and from the comments), but none seemed to give me the correct protocol.

The following works, however:

proto = None
op, fst, snd = next(pickletools.genops(data))
if op.name == 'PROTO':
    proto = fst

Alternative (not cool, as it unpickles the whole thing):

out = io.StringIO()
pickletools.dis(data, out)
firstline = out.getvalue().splitlines()[0]
if ' PROTO ' in firstline:
    proto = re.sub(r'.*\s+', '', firstline)
    proto = int(proto)

Application: I want to find out what pickle protocol has been used in a pandas.to_hdf() (if pickling has been used, which is not always the case) and, since I don't fancy analyzing the whole structure of the HDF5 file, I am using a MonkeyPatch to spy on what pickle.loads() is asked to deserialize.

Whoever lands here via a Google search, here is my whole (kludgy) setup:

__pickle_loads = pickle.loads


def mock_pickle_loads(data):
    global max_proto_found
    op, fst, snd = next(pickletools.genops(data))
    if op.name == 'PROTO':
        proto = fst
        max_proto_found = max(max_proto_found, proto)
    return __pickle_loads(data)


def max_pklproto_hdf(hdf_filename):
    global max_proto_found
    max_proto_found = -1
    with MonkeyPatch().context() as m:
        m.setattr(pickle, 'loads', mock_pickle_loads)
        try:
            pd.read_hdf(hdf_filename)
        except ValueError:
            pass
    return max_proto_found

answered Sep 27 '22 16:09

Pierre D

Related questions
                            
                                "AttributeError: sqrt" when calculating a simple standard deviation
                            
                                Initialize all the classes in a module into nameless objects in a list
                            
                                How to handle MIME type in tornado?
                            
                                Get min and max elements for 2 corresponding series in pandas
                            
                                Write and read Datetime to binary format in Python
                            
                                Runtime Error with copy.deepcopy in Python
                            
                                UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5
                            
                                Is that possible to run a python built program on iOS as a static lib?
                            
                                ReactorNotRestartable when launching two equivalent unittest with twisted and trial
                            
                                Upload a file using boto
                            
                                What is the best way to run REST API versions with Python Flask [closed]
                            
                                Sending a POST with mechanize and requests.
                            
                                what does the double underscore __ mean in python? [duplicate]
                            
                                Flask-Admin upload and insert in database automatically
                            
                                How write csv file without new line character in last line?
                            
                                Django:No module named django.core.management
                            
                                Print progress of pool.map_async
                            
                                Drawing grid pattern in matplotlib
                            
                                Efficient way to round to arbitrary precision in Python [closed]
                            
                                Custom Scheduler to have sequential + semi-sequential scripts with timeouts/kill switches?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With