Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why not use pickle instead of struct?

I am unable to understand the use of pickle module vs. the struct module. Both convert a Python object into a byte stream. It seems easier to use pickle than to do the packing and unpacking of the struct module. So when is pickle used and when is struct used?

like image 975
debashish Avatar asked Dec 10 '22 08:12

debashish


1 Answers

I think you have a misunderstanding of what struct does.

Struct

Struct is not meant to store Python objects into a byte stream. What it does is producing a byte stream by transforming Python objects into structures that represent the data the objects contained. For instance used a signed 32-bit representation for an integer. But a struct for instance is not designed to store a dictionary, since there are a lot of ways to serialize a dictionary.

It is used to construct a (binary) file that meets the criteria of a protocol. For instance if you have a 3d model, then you perhaps want to write an exporter to the .3ds file format. This format follows a certain protocol (for instance it will start with 0x4d4d). You can not use pickle to dump to such format, since Pickle is actually a specific protocol.

The same with reading binary files into Python objects. You can not run Pickle over a .3ds file, since Pickle does not know the protocol. It does not know what 0x4d4d in the beginning of the file means. It can be a 16-bit integer (19789), it can be a 2-character ASCII string ('MM'), etc. Usually most binary files are designed for one purpose. And you need to understand the protocol in order to read/write such files.

Pickle

Pickle on the other hand is a tool designed to store Python objects in a binary stream, such that we can load these objects back once we need these. It defines a protocol. For instance pickle always starts the stream with byte 128, followed by the protocol version (1, 2, or 3). The next byte specifies an identifier of the type of object we are going to pickle (for instance 75 for an integer, 88 for a string, etc.

Pickle also has to serialize all references of the object, and keep track of the objects it has already serialized, since there can be cyclic structures into it. For instance if we have two dictionaries:

d = {}
e = {'a': d}
d['a'] = e

then we can not simply serialize d, and serialize e as part of e. We have to keep track that we serialized d already, since serializing e would otherwise resulting serializing d, etc. until we run out of memory.

Pickle is thus a specific protocol to store Python objects. But we can not use it to serialize to a specific format, such that other (non-Python) programs can read it.

like image 112
Willem Van Onsem Avatar answered Dec 12 '22 20:12

Willem Van Onsem