Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.DataFrame.load/save between python2 and python3: pickle protocol issues

Tags:

python

pandas

I haven't figure out how to do pickle load/save's between python 2 and 3 with pandas DataFrames. There is a 'protocol' option in the pickler that I've played with unsuccessfully but I'm hoping someone has a quick idea for me to try. Here is the code to get the error:

python2.7

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
ValueError: unsupported pickle protocol: 3

python3

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)

Maybe expecting pickle to work between python version is a bit optimistic?

like image 769
mathtick Avatar asked Jan 29 '13 15:01

mathtick


2 Answers

I had the same problem. You can change the protocol of the dataframe pickle file with the following function in python3:

import pickle
def change_pickle_protocol(filepath,protocol=2):
    with open(filepath,'rb') as f:
        obj = pickle.load(f)
    with open(filepath,'wb') as f:
        pickle.dump(obj,f,protocol=protocol)

Then you should be able to open it in python2 no problem.

like image 169
ben.dichter Avatar answered Oct 18 '22 11:10

ben.dichter


If somebody uses pandas.DataFrame.to_pickle() then do the following modification in source code to have the capability of pickle protocol setting:

1) In source file /pandas/io/pickle.py (before modification copy the original file as /pandas/io/pickle.py.ori) search for the following lines:

def to_pickle(obj, path):

pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)

Change these lines to:

def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):

pkl.dump(obj, f, protocol=protocol)

2) In source file /pandas/core/generic.py (before modification copy the original file as /pandas/core/generic.py.ori) search for the following lines:

def to_pickle(self, path):

return to_pickle(self, path)

Change these lines to:

def to_pickle(self, path, protocol=None):

return to_pickle(self, path, protocol)

3) Restart your python kernel if it runs then save your dataframe using any available pickle protocol (0, 1, 2, 3, 4):

# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)

# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')

4) After pandas upgrade, repeat step 1 & 2.

5) (optional) Ask the developers to have this capability in official releases (because your code will throw exception on any other Python environments without these changes)

Nice day!

like image 1
ragesz Avatar answered Oct 18 '22 09:10

ragesz