I am trying to apply <code>_pickle</code> to save data onto disk. But when calling <code>_pickle.dump</code>, I got an error <pre class="prettyprint"><code>OverflowError: cannot serialize a bytes object larger than 4 GiB </code></pre> Is this a hard limitation to use <code>_pickle</code>? (<code>cPickle</code> for python2)

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0 https://www.python.org/dev/peps/pep-3154/ But you need to say you want to use version 4 of the protocol: https://docs.python.org/3/library/pickle.html <pre class="prettyprint"><code>pickle.dump(d, open("file", 'w'), protocol=4) </code></pre>

Yes, this is a hard-coded limit; from <code>save_bytes</code> function: <pre class="prettyprint lang-c prettyprint-override"><code>else if (size <= 0xffffffffL) { // ... } else { PyErr_SetString(PyExc_OverflowError, "cannot serialize a bytes object larger than 4 GiB"); return -1; /* string too large */ } </code></pre> The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232 == 4GB. If you can break up the <code>bytes</code> object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.

_pickle in python3 doesn't work for large data saving

Tags:

python

pickle

I am trying to apply _pickle to save data onto disk. But when calling _pickle.dump, I got an error

OverflowError: cannot serialize a bytes object larger than 4 GiB

Is this a hard limitation to use _pickle? (cPickle for python2)

428

asked Apr 17 '15 16:04

Jake0x32

2 Answers

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4)

105

answered Oct 20 '22 04:10

Eric Levieil

Yes, this is a hard-coded limit; from save_bytes function:

else if (size <= 0xffffffffL) {     // ... } else {     PyErr_SetString(PyExc_OverflowError,                     "cannot serialize a bytes object larger than 4 GiB");     return -1;          /* string too large */ }

The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 2³² == 4GB.

If you can break up the bytes object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.

answered Oct 20 '22 03:10

Martijn Pieters

Related questions
                            
                                Python class static methods
                            
                                How can I import urlparse in python-3? [duplicate]
                            
                                Python virtualenv questions
                            
                                Retain all entries except for one key python
                            
                                Is Django for the frontend or backend? [closed]
                            
                                Validating detailed types in python dataclasses
                            
                                Display Python datetime without time
                            
                                All possible permutations of a set of lists in Python
                            
                                Can I combine two decorators into a single one in Python?
                            
                                AttributeError: type object 'Callable' has no attribute '_abc_registry'
                            
                                Get meta tag content property with BeautifulSoup and Python
                            
                                Python super method and calling alternatives
                            
                                Multithreaded web server in python
                            
                                Pandas convert a column of list to dummies
                            
                                Extract files from zip without keeping the structure using python ZipFile?
                            
                                django: return string from view
                            
                                zip file and avoid directory structure
                            
                                Name not defined in type annotation [duplicate]
                            
                                set matplotlib 3d plot aspect ratio
                            
                                How do I get Python's ElementTree to pretty print to an XML file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With