I am unable to understand the use of pickle
module vs. the struct
module. Both convert a Python object into a byte stream. It seems easier to use pickle
than to do the packing and unpacking of the struct
module. So when is pickle
used and when is struct
used?
I think you have a misunderstanding of what struct
does.
Struct is not meant to store Python objects into a byte stream. What it does is producing a byte stream by transforming Python objects into structures that represent the data the objects contained. For instance used a signed 32-bit representation for an integer. But a struct
for instance is not designed to store a dictionary, since there are a lot of ways to serialize a dictionary.
It is used to construct a (binary) file that meets the criteria of a protocol. For instance if you have a 3d model, then you perhaps want to write an exporter to the .3ds
file format. This format follows a certain protocol (for instance it will start with 0x4d4d
). You can not use pickle to dump to such format, since Pickle is actually a specific protocol.
The same with reading binary files into Python objects. You can not run Pickle over a .3ds
file, since Pickle does not know the protocol. It does not know what 0x4d4d
in the beginning of the file means. It can be a 16-bit integer (19789
), it can be a 2-character ASCII string ('MM')
, etc. Usually most binary files are designed for one purpose. And you need to understand the protocol in order to read/write such files.
Pickle on the other hand is a tool designed to store Python objects in a binary stream, such that we can load these objects back once we need these. It defines a protocol. For instance pickle always starts the stream with byte 128
, followed by the protocol version (1
, 2
, or 3
). The next byte specifies an identifier of the type of object we are going to pickle (for instance 75
for an integer, 88
for a string, etc.
Pickle also has to serialize all references of the object, and keep track of the objects it has already serialized, since there can be cyclic structures into it. For instance if we have two dictionaries:
d = {}
e = {'a': d}
d['a'] = e
then we can not simply serialize d
, and serialize e
as part of e
. We have to keep track that we serialized d
already, since serializing e
would otherwise resulting serializing d
, etc. until we run out of memory.
Pickle is thus a specific protocol to store Python objects. But we can not use it to serialize to a specific format, such that other (non-Python) programs can read it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With