Using compression with Pandas and HD5 / HDFStore

Question

For a few aspects of a project, using "h5" storage would be ideal. However, the files are becoming massive and frankly we're running out of space.

This statement...

 store.put(storekey, data, table=False, compression='gzip')

does not produce any difference in terms of file size than...

 store.put(storekey, data, table=False)

Is using compression even possible when going through Pandas?

... if it isn't possible, I don't mind using h5py, however, I'm uncertain what to put for a "datatype" as the DataFrame contains all sorts of types (strings, float, int etc.)

Any help/insight would be appreciated!

Jeff · Accepted Answer

see docs in regards to compression using HDFStore

gzip is not a valid compression option (and is ignored, that's a bug). try any of zlib, bzip2, lzo, blosc (bzip2/lzo might need extra libraries installed)

see for PyTables docs on the various compression

Heres a question semi-related.

Quentin Stafford-Fraser · Answer

I've ben quite a fan of HDF5 in the past, but having hit a variety of complications, especially with Pandas HDFStore, I'm starting to think Exdir is a good idea.

http://exdir.readthedocs.io

Alexander Martins · Answer

You can write you data in a zipped format like this:

import pandas as pd

some_key = 'some_key'

with pd.HDFStore('path/to/your/h5/file.h5', complevel=9, complib='zlib') as store:
    store[some_key] = your_data_to_save_in_the_key

And you can read it back:

with pd.HDFStore('path/to/your/h5/file.h5', complevel=9, complib='zlib') as store:
    data_retrieved = store[some_key]

Using compression with Pandas and HD5 / HDFStore

Tags:

python

pandas

hdf5

TravisVOX

3 Answers

Jeff

Quentin Stafford-Fraser

Alexander Martins

Recent Activity

Donate For Us

Using compression with Pandas and HD5 / HDFStore

Tags:

python

pandas

hdf5

TravisVOX

3 Answers

Jeff

Quentin Stafford-Fraser

Alexander Martins

Related questions

Recent Activity

Donate For Us