Is there a way to tell Pandas to use a specific pickle protocol (e.g. 4) when writing an HDF5 file?
Here is the situation (much simplified):
Client A is using python=3.8.1
(as well as pandas=1.0.0
and pytables=3.6.1
). A writes some DataFrame using df.to_hdf(file, key)
.
Client B is using python=3.7.1
(and, as it happened, pandas=0.25.1
and pytables=3.5.2
--but that's irrelevant). B tries to read the data written by A using pd.read_hdf(file, key)
, and fails with ValueError: unsupported pickle protocol: 5
.
Mind you, this doesn't happen with a purely numerical DataFrame (e.g. pd.DataFrame(np.random.normal(size=(10,10)))
. So here is a reproducible example:
(base) $ conda activate py38
(py38) $ python
Python 3.8.1 (default, Jan 8 2020, 22:29:32)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame(['hello', 'world']))
>>> df.to_hdf('foo', 'x')
>>> exit()
(py38) $ conda deactivate
(base) $ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_hdf('foo', 'x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 407, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 782, in select
return it.get_result()
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 1639, in get_result
results = self.func(self.start, self.stop, where)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 766, in func
return s.read(start=_start, stop=_stop, where=_where, columns=columns)
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 3206, in read
"block{idx}_values".format(idx=i), start=_start, stop=_stop
File "/opt/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py", line 2737, in read_array
ret = node[0][start:stop]
File "/opt/anaconda3/lib/python3.7/site-packages/tables/vlarray.py", line 681, in __getitem__
return self.read(start, stop, step)[0]
File "/opt/anaconda3/lib/python3.7/site-packages/tables/vlarray.py", line 825, in read
outlistarr = [atom.fromarray(arr) for arr in listarr]
File "/opt/anaconda3/lib/python3.7/site-packages/tables/vlarray.py", line 825, in <listcomp>
outlistarr = [atom.fromarray(arr) for arr in listarr]
File "/opt/anaconda3/lib/python3.7/site-packages/tables/atom.py", line 1227, in fromarray
return six.moves.cPickle.loads(array.tostring())
ValueError: unsupported pickle protocol: 5
>>>
Note: I tried also reading using pandas=1.0.0
(and pytables=3.6.1
) in python=3.7.4
. That fails too, so I believe it is simply the Python version (3.8 writer vs 3.7 reader) that causes the problem. This makes sense since pickle protocol 5 was introduced as PEP-574 for Python 3.8.
Pandas DataFrame: to_pickle() functionThe to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file.
Pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle when using the “fixed” format.
Pickle is a serialized way of storing a Pandas dataframe.
JSON Files json extension. Python and Pandas work well with JSON files, as Python's json library offers built-in support for them.
PyTable uses the highest protocol by default, which is hardcoded here: https://github.com/PyTables/PyTables/blob/50dc721ab50b56e494a5657e9c8da71776e9f358/tables/atom.py#L1216
As a workaround, you can monkey-patch the pickle
module on the client A who writes a HDF file. You should do that before importing pandas
:
import pickle
pickle.HIGHEST_PROTOCOL = 4
import pandas
df.to_hdf(file, key)
Now the HDF file has been created using pickle protocol version 4 instead version 5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With