Sorry if this is a very basic question on h5py
.
I was reading the documentation, but I didn't find a similar example.
I'm trying to create multiple hdf5 datasets with Python, but it turns out after I close the file data will be overwritten.
Let's say I do the following:
import numpy as np
import h5py
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
f["data1"].value
f.close()
I get
KeyError: "Unable to open object (Object 'data1' doesn't exist)"
If I append data, that requires first opening in 'w'
mode and then appending in 'a'
mode, having two different statements.
import numpy as np
import h5py
f = h5py.File('test.hdf5', 'w')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
f["data1"].value
f.close()
If I open the file in 'a'
mode in both cases:
import numpy as np
import h5py
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data1', data = np.ones(10))
f.close()
f = h5py.File('test.hdf5', 'a')
f.create_dataset('data0', data = np.zeros(10))
f.close()
f = h5py.File('test.hdf5', 'r')
print(f['data1'].value)
f.close()
RuntimeError: Unable to create link (Name already exists)
According to the documentation, data should be stored contiguously, but I didn't find how to avoid overwriting data.
How can I store data on a previously closed hdf5 only using one single statement?
HDF5 files are organized in a hierarchical structure, with two primary structures: groups and datasets. HDF5 group: a grouping structure containing instances of zero or more groups or datasets, together with supporting metadata. HDF5 dataset: a multidimensional array of data elements, together with supporting metadata.
This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.
Encodings. HDF5 supports two string encodings: ASCII and UTF-8.
If you want to create a unique file in each run, then you should consider naming the file like that , an example would be to add the timestamp to the name of the file, A very simply example would be to use datetime
module and now
and strftime
method to create the file name. Example -
import datetime
filename = "test_{}.hdf5".format(datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S"))
Then you can use that filename to open the file.
Demo -
>>> import datetime
>>> filename = "test_{}.hdf5".format(datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S"))
>>> filename
'test_2015_08_09_13_33_43.hdf5'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With