Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 - Can pickle handle byte objects larger than 4GB?

Based on this comment and the referenced documentation, Pickle 4.0+ from Python 3.4+ should be able to pickle byte objects larger than 4 GB.

However, using python 3.4.3 or python 3.5.0b2 on Mac OS X 10.10.4, I get an error when I try to pickle a large byte array:

>>> import pickle >>> x = bytearray(8 * 1000 * 1000 * 1000) >>> fp = open("x.dat", "wb") >>> pickle.dump(x, fp, protocol = 4) Traceback (most recent call last):   File "<stdin>", line 1, in <module> OSError: [Errno 22] Invalid argument 

Is there a bug in my code or am I misunderstanding the documentation?

like image 823
RandomBits Avatar asked Jul 17 '15 03:07

RandomBits


People also ask

Why pickle is not good in Python?

Pickle on the other hand is slow, insecure, and can be only parsed in Python. The only real advantage to pickle is that it can serialize arbitrary Python objects, whereas both JSON and MessagePack have limits on the type of data they can write out.

What objects Cannot be pickled in Python?

With pickle protocol v1, you cannot pickle open file objects, network connections, or database connections.

Can all Python objects be pickled?

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk.

What is highest protocol pickle?

pickle. HIGHEST_PROTOCOL. An integer, the highest protocol version available. This value can be passed as a protocol value to functions dump() and dumps() as well as the Pickler constructor.


1 Answers

Here is a simple workaround for issue 24658. Use pickle.loads or pickle.dumps and break the bytes object into chunks of size 2**31 - 1 to get it in or out of the file.

import pickle import os.path  file_path = "pkl.pkl" n_bytes = 2**31 max_bytes = 2**31 - 1 data = bytearray(n_bytes)  ## write bytes_out = pickle.dumps(data) with open(file_path, 'wb') as f_out:     for idx in range(0, len(bytes_out), max_bytes):         f_out.write(bytes_out[idx:idx+max_bytes])  ## read bytes_in = bytearray(0) input_size = os.path.getsize(file_path) with open(file_path, 'rb') as f_in:     for _ in range(0, input_size, max_bytes):         bytes_in += f_in.read(max_bytes) data2 = pickle.loads(bytes_in)  assert(data == data2) 
like image 109
lunguini Avatar answered Oct 05 '22 23:10

lunguini