Does the following read from a dataset without loading the entire thing at once into memory [the whole thing will not fit into memory] and get the size of the dataset without loading the data using h5py in python? if not, how?
h5 = h5py.File('myfile.h5', 'r')
mydata = h5.get('matirx') # are all data loaded into memory by using h5.get?
part_of_mydata= mydata[1000:11000,:]
size_data = mydata.shape
Thanks.
This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.
Open a HDF5/H5 file in HDFView hdf5 file on your computer. Open this file in HDFView. If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file.
The h5py package is a Pythonic interface to the HDF5 binary data format. HDF5 lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.
get
(or indexing) fetches a reference to the Dataset on the file, but does not load any data.
In [789]: list(f.keys())
Out[789]: ['dset', 'dset1', 'vset']
In [790]: d=f['dset1']
In [791]: d
Out[791]: <HDF5 dataset "dset1": shape (2, 3, 10), type "<f8">
In [792]: d.shape # shape of dataset
Out[792]: (2, 3, 10)
In [793]: arr=d[:,:,:5] # indexing the set fetches part of the data
In [794]: arr.shape
Out[794]: (2, 3, 5)
In [795]: type(d)
Out[795]: h5py._hl.dataset.Dataset
In [796]: type(arr)
Out[796]: numpy.ndarray
d
the Dataset is array like, but not actually a numpy
array.
Fetch the whole Dataset with:
In [798]: arr = d[:]
In [799]: type(arr)
Out[799]: numpy.ndarray
Exactly how of the file it has to read to fetch yourslice depends on the slicing, data layout, chunking, and other things that generally aren't under your control, and shouldn't worry you.
Note also that when reading one dataset I'm not loading the others. Same would apply to groups.
http://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With