Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparse array support in HDF5

I need to store a 512^3 array on disk in some way and I'm currently using HDF5. Since the array is sparse a lot of disk space gets wasted.

Does HDF5 provide any support for sparse array ?

like image 686
andreabedini Avatar asked Aug 23 '10 07:08

andreabedini


People also ask

Why are HDF5 files so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.

Does HDF5 compress data?

Compression and Chunk Storage One of the most powerful features of HDF5 is its ability to store and modify compressed data.

What is HDF5 used for?

The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. HDF5 uses a "file directory" like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer.

Is HDF5 binary?

The HDF5 file format is a cross platform binary format for storing scientific data. HDF5 allows you to reduce the size of the file data by compressing repeated values. This allows your data to be read and written much faster than if you stored the data as ASCII (plain text) files.


1 Answers

One workaround is to create the dataset with a compression option. For example, in Python using h5py:

import h5py
f = h5py.File('my.h5', 'w')
d = f.create_dataset('a', dtype='f', shape=(512, 512, 512), fillvalue=-999.,
                     compression='gzip', compression_opts=9)
d[3, 4, 5] = 6
f.close()

The resulting file is 4.5 KB. Without compression, this same file would be about 512 MB. That's a compression of 99.999%, because most of the data are -999. (or whatever fillvalue you want).


The equivalent can be achieved using the C++ HDF5 API by setting H5::DSetCreatPropList::setDeflate to 9, with an example shown in h5group.cpp.

like image 54
Mike T Avatar answered Oct 20 '22 00:10

Mike T