Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

_pickle in python3 doesn't work for large data saving

Tags:

python

pickle

I am trying to apply _pickle to save data onto disk. But when calling _pickle.dump, I got an error

OverflowError: cannot serialize a bytes object larger than 4 GiB 

Is this a hard limitation to use _pickle? (cPickle for python2)

like image 428
Jake0x32 Avatar asked Apr 17 '15 16:04

Jake0x32


People also ask

Should you use pickle Python?

Cons-1: Pickle is Unsafe Unlike JSON, which is just a piece of string, it is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Therefore, we should NEVER unpickle data that could have come from an untrusted source, or that could have been tampered with.

Does pickle work across Python versions?

Python's pickle is perfectly cross-platform.

Is python pickle fast?

Pickle is slow Pickle is both slower and produces larger serialized values than most of the alternatives. Pickle is the clear underperformer here. Even the 'cPickle' extension that's written in C has a serialization rate that's about a quarter that of JSON or Thrift.

Why is pickle insecure?

The insecurity is not because pickles contain code, but because they create objects by calling constructors named in the pickle. Any callable can be used in place of your class name to construct objects. Malicious pickles will use other Python callables as the “constructors.” For example, instead of executing “models.


2 Answers

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4) 
like image 105
Eric Levieil Avatar answered Oct 20 '22 04:10

Eric Levieil


Yes, this is a hard-coded limit; from save_bytes function:

else if (size <= 0xffffffffL) {     // ... } else {     PyErr_SetString(PyExc_OverflowError,                     "cannot serialize a bytes object larger than 4 GiB");     return -1;          /* string too large */ } 

The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB.

If you can break up the bytes object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.

like image 31
Martijn Pieters Avatar answered Oct 20 '22 03:10

Martijn Pieters