I have a large structure of primitive types within nested dict/list. The structure is quite complicated and doesn't really matter.
If I represent it in python's built-in types (dict
/list
/float
/int
/str
) it takes 1.1 GB, but if I store it in protobuf
and load it to memory it is significantly smaller. ~250 MB total.
I'm wondering how can this be. Are the built-in types in python inefficient in comparison to some external library?
Edit: The structure is loaded from json file. So no internal references between objects
We can encode messages into binary format using SerializeToString() and ParseFromString(). In the above code, we have serialized data of bytes and write into to file using the wb flags. We can read this file binary file and parse it using ParseFromString() method. It will return bytes representation of string.
Protocol buffers (Protobuf) are a language-agnostic data serialization format developed by Google. Protobuf is great for the following reasons: Low data volume: Protobuf makes use of a binary format, which is more compact than other formats such as JSON. Persistence: Protobuf serialization is backward-compatible.
Protobuf doesn't support inheritance. Having a common header and using composition is the best solution. You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.
"Simple" python objects, such as int
or float
, need much more memory than their C-counterparts used by protobuf
.
Let's take a list
of Python integers as example compared to an array of integers, as for example in an array.array
(i.e. array.array('i', ...)
).
The analysis for array.array
is simple: discarding some overhead from the array.arrays
-object, only 4 bytes (size of a C-integer) are needed per element.
The situation is completely different for a list of integers:
8
additional bytes for a 64bit executable)28
bytes (see import sys; sys.getsizeof(1)
returns 28): 8 bytes are needed for reference counting, 8 bytes to hold a pointer to the integer-function table, 8 bytes are needed for the size of the integer value (Python's integers can be much bigger than 2^32), and at least 4 byte to hold the integer value itself.This means there is a whopping cost of 40.5 bytes per Python integer compared to the possible 4 bytes (or 8 bytes if we use long long int
, i.e. 64bit integers).
A situation is similar for a list with Python floats compared to an array of doubles
( i.e. array.array('d',...)
), which only needs about 8 bytes per element. But for list we have:
8
additional bytes for a 64bit executable)24
bytes (see import sys; sys.getsizeof(1.0)
returns 24): 8 bytes are needed for reference counting, 8 bytes to hold a pointer to the float-function table, and 8 bytes to hold the double
-value itself.Which means 32.5 bytes for a Python float object vs. 8 byte for a C-double.
protobuf
uses internally the same representation of the data as array.array
and thus needs much less memory (about 4-5 times less, as you observe). numpy.array
is another example for a data type, which holds raw C-values and thus needs much less memory than lists.
If one doesn't need to search in a dictionary, then saving the key-values-pairs in a list will need less memory than in a dictionary, because one doesn't have to maintain a structure for searching (which imposes some memory costs) - this is also another thing that leads to smaller memory footprint of protobuf
-data.
To answer your other question: There are no built-in modules which are to Python-dict
, what array.array
are to Python-list
, so I use this opportunity to shamelessly plug-in an advertisement for a library of mine: cykhash
.
Sets and maps from cykhash
need less than 25% of Python'S-dict
/set
memory but are about the same fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With