Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialize a dictionary containing pandas data-frames (Python)

I have a dict containing several pandas Dataframe (identified by keys) , any suggestion to effectively serialize (and cleanly load) it . Here is the structure (a pprint display output ). Each of dict['method_x_']['meas_x_'] is a pandas Dataframe. The goal is to save the dataframes for a further plotting with some specific plotting options.

{'method1':

{'meas1':

                          config1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,
'meas2':   
                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760,

'method2':

{ 'meas1':

                          congif1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,

'meas2':

                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760}}
like image 964
Wajih Avatar asked Jul 28 '13 10:07

Wajih


2 Answers

Use pickle.dump(s) and pickle.load(s). It actually works. Pandas DataFrames also have their own method df.save("filename") that you can use to serialize a single DataFrame...

like image 164
sjakobi Avatar answered Sep 24 '22 03:09

sjakobi


In my particular use case, I tried to do a simple pickle.dump(all_df, open("all_df.p","wb"))

And while it loaded properly with> all_df = pickle.load(open("all_df.p","rb"))

When I restarted my Jupiter enviroment I would get a UnpicklingError: invalid load key, '\xef'.

One of the methods described here state that we can use HDF5 (pytables) to do the job. From their docs:

HDFStore is a dict-like object which reads and writes pandas

But it seems to be picky about the tablesversion that you use. I got mine to work after a pip install --upgrade tables and doing a runtime restart.

If you need a overall idea on how to use it:

#consider all_df as a list of dataframes
with pd.HDFStore('df_store.h5') as df_store:
    for i in all_df.keys():
        df_store[i] = all_df[i]

You should have a df_store.h5 file that you can convert back using the reverse process:

new_all_df = dict()
with pd.HDFStore('df_store.h5') as df_store:
    for i in df_store.keys():
        new_all_df[i] = df_store[i]
like image 22
Johnny Bigoode Avatar answered Sep 21 '22 03:09

Johnny Bigoode