I haven't figure out how to do pickle load/save's between python 2 and 3 with pandas DataFrames. There is a 'protocol' option in the pickler that I've played with unsuccessfully but I'm hoping someone has a quick idea for me to try. Here is the code to get the error:
python2.7
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
ValueError: unsupported pickle protocol: 3
python3
>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
return com.load(path)
File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)
Maybe expecting pickle to work between python version is a bit optimistic?
I had the same problem. You can change the protocol of the dataframe pickle file with the following function in python3:
import pickle
def change_pickle_protocol(filepath,protocol=2):
with open(filepath,'rb') as f:
obj = pickle.load(f)
with open(filepath,'wb') as f:
pickle.dump(obj,f,protocol=protocol)
Then you should be able to open it in python2 no problem.
If somebody uses pandas.DataFrame.to_pickle()
then do the following modification in source code to have the capability of pickle protocol setting:
1) In source file /pandas/io/pickle.py
(before modification copy the original file as /pandas/io/pickle.py.ori
) search for the following lines:
def to_pickle(obj, path):
pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
Change these lines to:
def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):
pkl.dump(obj, f, protocol=protocol)
2) In source file /pandas/core/generic.py
(before modification copy the original file as /pandas/core/generic.py.ori
) search for the following lines:
def to_pickle(self, path):
return to_pickle(self, path)
Change these lines to:
def to_pickle(self, path, protocol=None):
return to_pickle(self, path, protocol)
3) Restart your python kernel if it runs then save your dataframe using any available pickle protocol (0, 1, 2, 3, 4):
# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)
# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')
4) After pandas upgrade, repeat step 1 & 2.
5) (optional) Ask the developers to have this capability in official releases (because your code will throw exception on any other Python environments without these changes)
Nice day!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With