Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write hdf5 files without overwriting?

Tags:

python

h5py

Sorry if this is a very basic question on h5py.

I was reading the documentation, but I didn't find a similar example.

I'm trying to create multiple hdf5 datasets with Python, but it turns out after I close the file data will be overwritten.

Let's say I do the following:

import numpy as np
import h5py
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
f["data1"].value
f.close()

I get

KeyError: "Unable to open object (Object 'data1' doesn't exist)"

If I append data, that requires first opening in 'w' mode and then appending in 'a' mode, having two different statements.

import numpy as np
import h5py
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
f["data1"].value
f.close()

If I open the file in 'a' mode in both cases:

import numpy as np
import h5py
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
print(f['data1'].value)
f.close()

RuntimeError: Unable to create link (Name already exists)

According to the documentation, data should be stored contiguously, but I didn't find how to avoid overwriting data.

How can I store data on a previously closed hdf5 only using one single statement?

like image 922
ilciavo Avatar asked Aug 09 '15 07:08

ilciavo


People also ask

How are HDF5 files structured?

HDF5 files are organized in a hierarchical structure, with two primary structures: groups and datasets. HDF5 group: a grouping structure containing instances of zero or more groups or datasets, together with supporting metadata. HDF5 dataset: a multidimensional array of data elements, together with supporting metadata.

Why is HDF5 file so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.

Can HDF5 store strings?

Encodings. HDF5 supports two string encodings: ASCII and UTF-8.


1 Answers

If you want to create a unique file in each run, then you should consider naming the file like that , an example would be to add the timestamp to the name of the file, A very simply example would be to use datetime module and now and strftime method to create the file name. Example -

import datetime
filename = "test_{}.hdf5".format(datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S"))

Then you can use that filename to open the file.


Demo -

>>> import datetime
>>> filename = "test_{}.hdf5".format(datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S"))
>>> filename
'test_2015_08_09_13_33_43.hdf5'
like image 95
Anand S Kumar Avatar answered Oct 05 '22 08:10

Anand S Kumar