Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting hdf5 dataset using h5py

Tags:

Is there any way to remove a dataset from an hdf5 file, preferably using h5py? Or alternatively, is it possible to overwrite a dataset while keeping the other datasets intact?

To my understanding, h5py can read/write hdf5 files in 5 modes

f = h5py.File("filename.hdf5",'mode') 

where mode can be rfor read, r+ for read-write, a for read-write but creates a new file if it doesn't exist, w for write/overwrite, and w- which is same as w but fails if file already exists. I have tried all but none seem to work.

Any suggestions are much appreciated.

like image 796
hsnee Avatar asked Aug 06 '15 17:08

hsnee


People also ask

Why is HDF5 file so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.

What is the use of h5py in Python?

The h5py package is a Pythonic interface to the HDF5 binary data format. HDF5 lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.

How do I close h5py?

Closing files If you call File. close() , or leave a with h5py. File(...) block, the file will be closed and any objects (such as groups or datasets) you have from that file will become unusable.


2 Answers

Yes, this can be done.

with h5py.File(input,  "a") as f:     del f[datasetname] 

You will need to have the file open in a writeable mode, for example append (as above) or write.

As noted by @seppo-enarvi in the comments the purpose of the previously recommended f.__delitem__(datasetname) function is to implement the del operator, so that one can delete a dataset using del f[datasetname]

like image 168
EnemyBagJones Avatar answered Oct 26 '22 23:10

EnemyBagJones


I tried this out and the only way I could actually reduce the size of the file is by copying everything to a new file and just leaving out the dataset I was not interested in:

fs = h5py.File('WFA.h5', 'r') fd = h5py.File('WFA_red.h5', 'w') for a in fs.attrs:     fd.attrs[a] = fs.attrs[a] for d in fs:     if not 'SFS_TRANSITION' in d: fs.copy(d, fd) 
like image 43
Felix Avatar answered Oct 27 '22 00:10

Felix