I want to read the Street View House Numbers (SVHN) Dataset by using h5py
In [117]: def printname(name):
...: print(name)
...:
In [118]: data['/digitStruct'].visit(printname)
bbox
name
There are two group in the data, bbox
and name
, name
is the group name corresponding to the file name data, and bbox
is the group name corresponding to the width, height, top, left and label data.
How can I visit all the data in name
and bbox
group?
I have tried with the following code from the Docs, but it just return HDF5 object reference.
In [119]: for i in data['/digitStruct/name']:
...: print(i[0])
...:
...:
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
Python version: 3.5 and OS: Windows 10.
Open a HDF5/H5 file in HDFView To begin, open the HDFView application. Within the HDFView application, select File --> Open and navigate to the folder where you saved the NEONDSTowerTemperatureData. hdf5 file on your computer. Open this file in HDFView.
An HDF5 dataset is an object composed of a collection of data elements, or raw data, and metadata that stores a description of the data elements, data layout, and all other information necessary to write, read, and interpret the stored data.
An HDF5 file is a container for two kinds of objects: datasets , which are array-like collections of data, and groups , which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays.
I'll answer my question here, after read the docs of h5py
, here is my code
def get_box_data(index, hdf5_data):
"""
get `left, top, width, height` of each picture
:param index:
:param hdf5_data:
:return:
"""
meta_data = dict()
meta_data['height'] = []
meta_data['label'] = []
meta_data['left'] = []
meta_data['top'] = []
meta_data['width'] = []
def print_attrs(name, obj):
vals = []
if obj.shape[0] == 1:
vals.append(obj[0][0])
else:
for k in range(obj.shape[0]):
vals.append(int(hdf5_data[obj[k][0]][0][0]))
meta_data[name] = vals
box = hdf5_data['/digitStruct/bbox'][index]
hdf5_data[box[0]].visititems(print_attrs)
return meta_data
def get_name(index, hdf5_data):
name = hdf5_data['/digitStruct/name']
return ''.join([chr(v[0]) for v in hdf5_data[name[index][0]].value])
Here the hdf5_data
is train_data = h5py.File('./train/digitStruct.mat')
, it works fine!
Here is some sample code to use the above two functions
mat_data = h5py.File(os.path.join(folder, 'digitStruct.mat'))
size = mat_data['/digitStruct/name'].size
for _i in tqdm.tqdm(range(size)):
pic = get_name(_i, mat_data)
box = get_box_data(_i, mat_data)
The above function shows how to get the name and the bbox data of each entry of the data!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With