For a few aspects of a project, using "h5" storage would be ideal. However, the files are becoming massive and frankly we're running out of space.
This statement...
store.put(storekey, data, table=False, compression='gzip')
does not produce any difference in terms of file size than...
store.put(storekey, data, table=False)
Is using compression even possible when going through Pandas?
... if it isn't possible, I don't mind using h5py, however, I'm uncertain what to put for a "datatype" as the DataFrame contains all sorts of types (strings, float, int etc.)
Any help/insight would be appreciated!
see docs in regards to compression using HDFStore
gzip
is not a valid compression option (and is ignored, that's a bug).
try any of zlib, bzip2, lzo, blosc
(bzip2/lzo might need extra libraries installed)
see for PyTables docs on the various compression
Heres a question semi-related.
I've ben quite a fan of HDF5 in the past, but having hit a variety of complications, especially with Pandas HDFStore, I'm starting to think Exdir is a good idea.
http://exdir.readthedocs.io
You can write you data in a zipped format like this:
import pandas as pd
some_key = 'some_key'
with pd.HDFStore('path/to/your/h5/file.h5', complevel=9, complib='zlib') as store:
store[some_key] = your_data_to_save_in_the_key
And you can read it back:
with pd.HDFStore('path/to/your/h5/file.h5', complevel=9, complib='zlib') as store:
data_retrieved = store[some_key]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With