I am trying to apply _pickle
to save data onto disk. But when calling _pickle.dump
, I got an error
OverflowError: cannot serialize a bytes object larger than 4 GiB
Is this a hard limitation to use _pickle
? (cPickle
for python2)
Cons-1: Pickle is Unsafe Unlike JSON, which is just a piece of string, it is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Therefore, we should NEVER unpickle data that could have come from an untrusted source, or that could have been tampered with.
Python's pickle is perfectly cross-platform.
Pickle is slow Pickle is both slower and produces larger serialized values than most of the alternatives. Pickle is the clear underperformer here. Even the 'cPickle' extension that's written in C has a serialization rate that's about a quarter that of JSON or Thrift.
The insecurity is not because pickles contain code, but because they create objects by calling constructors named in the pickle. Any callable can be used in place of your class name to construct objects. Malicious pickles will use other Python callables as the “constructors.” For example, instead of executing “models.
Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/
But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html
pickle.dump(d, open("file", 'w'), protocol=4)
Yes, this is a hard-coded limit; from save_bytes
function:
else if (size <= 0xffffffffL) { // ... } else { PyErr_SetString(PyExc_OverflowError, "cannot serialize a bytes object larger than 4 GiB"); return -1; /* string too large */ }
The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB.
If you can break up the bytes
object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With