Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2

We're using pandas Dataframe as our primary data container for our time series data. We pack the dataframe into binary blobs into a mongoDB document for storage along with keys for meta data about the time series blob.

We ran into an error when we upgraded from pandas 0.14.1 to 0.15.2.

Create binary blob of pandas Dataframe (0.14.1)

import lz4   
import cPickle

bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))

Error Case: Read back in from mongoDB with pandas 0.15.2

cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))

Success Case: Read back in from mongoDB with pandas 0.14.1 with no error.

This seems to be similar to an old stack thread Pandas compiled from source: default pickle behavior changed With a helpful comment from https://stackoverflow.com/users/644898/jeff

The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."

Any ideas on workaround or solutions?

To recreate error:

Setup in pandas 0.14.1 env:

df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error

Create error in pandas 0.15.2 env:

cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))
like image 213
Mike D Avatar asked Jan 14 '15 19:01

Mike D


People also ask

Is Pandas backward compatible?

Because of the number of changes to Pandas 1.0, some of Pandas's APIs are now backwards-incompatible. This includes changes to the behaviors of many common elements: The DataFrame type. pandas.

Can you pickle Pandas DataFrame?

Pandas DataFrame: to_pickle() functionThe to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file. By default, infers from the file extension in specified path.

Can you save DataFrame as pickle?

pickle saves the dataframe in it's current state thus the data and its format is preserved. This can lead to massive performance increases.

Why append is deprecated?

append was deprecated because: "Series. append and DataFrame. append [are] making an analogy to list. append, but it's a poor analogy since the behavior isn't (and can't be) in place.


1 Answers

This was explicity mentioned as the Index class now no-longer sub-classes ndarray but a pandas object, see here.

You simply need to use pd.read_pickle to read the pickles.

like image 142
Jeff Avatar answered Oct 02 '22 15:10

Jeff