I'm currently working on a project regarding compression of HDF5 datasets and recently began using h5py. I followed the basic tutorials and was able to open,create and compress a file while it was being created. However, I've been unsuccessful when it comes to compressing an existing file (which is the aim of my work).
I've tried opening files using 'r+' and then compressing chunked datasets but the file sizes have remained the same.
Any suggestions on what commands to use or am I going about things the wrong way?
HDF group provides a set of tools to convert, display, analyze and edit and repack your HDF5 file.
You can compress the existing hdf5 file using the h5repack utility. You can also change the chunk size using the same utility.
h5repack can used from command line.
h5repack file1 file2
//removes the accounted space of file 1 and saves it as file2.
h5repack -v -l CHUNK=1024 file1 file2
//Applies chunking of 1024 to the file1
h5repack -v -l CHUNK=1024 GZIP=5 file1 file2
//makes chunks of 1024 and compresses it
with GZIP level 5 compression
h5repack --help
\gets avalable help documentation
Detailed documentation is also available.
Compression is very easy to use in h5py. Check out the Wiki HowTo and Compression guides. Basically, it would be something like:
ds = myfile.create_dataset('ds', shape, dtype, compression='lzf')
There is also some issues with how you pick chunk sizes to optimize file size/access, see the Compression guide I linked to.
I do not remember which compression, if any, is on by default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With