Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cPickle - different results pickling the same object

Is anyone able to explain the comment under testLookups() in this code snippet?

I've run the code and indeed what the comment sais is true. However I'd like to understand why it's true, i.e. why is cPickle outputting different values for the same object depending on how it is referenced.

Does it have anything to do with reference count? If so, isn't that some kind of a bug - i.e. the pickled and deserialized object would have an abnormally high reference count and in effect would never get garbage collected?

like image 901
julx Avatar asked Sep 21 '11 14:09

julx


1 Answers

There is no guarantee that seemingly identical objects will produce identical pickle strings.

The pickle protocol is a virtual machine, and a pickle string is a program for that virtual machine. For a given object there exist multiple pickle strings (=programs) that will reconstruct that object exactly.

To take one of your examples:

>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."

The two pickle strings differ in their use of the p opcode. The opcode takes one integer argument and its function is as follows:

  name='PUT'    code='p'   arg=decimalnl_short

  Store the stack top into the memo.  The stack is not popped.

  The index of the memo location to write into is given by the newline-
  terminated decimal string following.  BINPUT and LONG_BINPUT are
  space-optimized versions.

To cut a long story short, the two pickle strings are basically equivalent.

I haven't tried to nail down the exact cause of the differences in generated opcodes. This could well have to do with reference counts of the objects being serialized. What is clear, however, that discrepancies like this will have no effect on the reconstructed object.

like image 53
NPE Avatar answered Sep 24 '22 09:09

NPE