Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2

Tags:

We're using pandas Dataframe as our primary data container for our time series data. We pack the dataframe into binary blobs into a mongoDB document for storage along with keys for meta data about the time series blob.

We ran into an error when we upgraded from pandas 0.14.1 to 0.15.2.

Create binary blob of pandas Dataframe (0.14.1)

import lz4   
import cPickle

bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))

Error Case: Read back in from mongoDB with pandas 0.15.2

cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))

Success Case: Read back in from mongoDB with pandas 0.14.1 with no error.

This seems to be similar to an old stack thread Pandas compiled from source: default pickle behavior changed With a helpful comment from https://stackoverflow.com/users/644898/jeff

The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."

Any ideas on workaround or solutions?

To recreate error:

Setup in pandas 0.14.1 env:

df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error

Create error in pandas 0.15.2 env:

cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))

213

asked Jan 14 '15 19:01

Mike D

1 Answers

This was explicity mentioned as the Index class now no-longer sub-classes ndarray but a pandas object, see here.

You simply need to use pd.read_pickle to read the pickles.

142

answered Oct 02 '22 15:10

Jeff

Related questions
                            
                                Django - Update model field based on another field
                            
                                Celery - Completes task but never returns result
                            
                                'invalid argument' error opening file (and not reading file)
                            
                                How to install/compile pip requirements in parallel (make -j equivalent)
                            
                                History across ipdb sessions
                            
                                Multiprocess sqlite INSERT: "database is locked"
                            
                                Deploying flask site/application on pythonanywhere.com
                            
                                Error installing package with pip
                            
                                Django MPTT efficiently serializing relational data with DRF
                            
                                Celery execute task with a batch of messages
                            
                                SQLAlchemy joins with composite foreign keys (with flask-sqlalchemy)
                            
                                Django error reporting emails: env vars leak info
                            
                                How multiarray.correlate2(a, v, mode) is actually implemented?
                            
                                IPython notebook interactive function: how to set the slider range
                            
                                Numpy repeat array along new axis
                            
                                Replacing named capturing groups with re.sub
                            
                                Haar Training: error (-215)_img.row * _img.cols == vecSize in function
                            
                                Add module inside cuckoo sandbox
                            
                                Convert and pad a list to numpy array
                            
                                How to make Menu.add_command() work in tkinter on the Mac?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2

Tags:

python

pandas

mongodb

pickle

Mike D

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us