Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

h5py, access data in Datasets in SVHN

Tags:

python

h5py

I want to read the Street View House Numbers (SVHN) Dataset by using h5py

In [117]: def printname(name):
     ...:     print(name)
     ...:

In [118]: data['/digitStruct'].visit(printname)
bbox
name

There are two group in the data, bbox and name, name is the group name corresponding to the file name data, and bbox is the group name corresponding to the width, height, top, left and label data.

How can I visit all the data in name and bbox group?

I have tried with the following code from the Docs, but it just return HDF5 object reference.

In [119]: for i in data['/digitStruct/name']:
     ...:     print(i[0])
     ...:
     ...:
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>

Python version: 3.5 and OS: Windows 10.

like image 222
GoingMyWay Avatar asked Dec 16 '16 02:12

GoingMyWay


People also ask

How do I open an H5 file?

Open a HDF5/H5 file in HDFView To begin, open the HDFView application. Within the HDFView application, select File --> Open and navigate to the folder where you saved the NEONDSTowerTemperatureData. hdf5 file on your computer. Open this file in HDFView.

What is HDF5 dataset?

An HDF5 dataset is an object composed of a collection of data elements, or raw data, and metadata that stores a description of the data elements, data layout, and all other information necessary to write, read, and interpret the stored data.

What is H5 file in Python?

An HDF5 file is a container for two kinds of objects: datasets , which are array-like collections of data, and groups , which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays.


1 Answers

I'll answer my question here, after read the docs of h5py, here is my code

def get_box_data(index, hdf5_data):
    """
    get `left, top, width, height` of each picture
    :param index:
    :param hdf5_data:
    :return:
    """
    meta_data = dict()
    meta_data['height'] = []
    meta_data['label'] = []
    meta_data['left'] = []
    meta_data['top'] = []
    meta_data['width'] = []

    def print_attrs(name, obj):
        vals = []
        if obj.shape[0] == 1:
            vals.append(obj[0][0])
        else:
            for k in range(obj.shape[0]):
                vals.append(int(hdf5_data[obj[k][0]][0][0]))
        meta_data[name] = vals

    box = hdf5_data['/digitStruct/bbox'][index]
    hdf5_data[box[0]].visititems(print_attrs)
    return meta_data

def get_name(index, hdf5_data):
    name = hdf5_data['/digitStruct/name']
    return ''.join([chr(v[0]) for v in hdf5_data[name[index][0]].value])

Here the hdf5_data is train_data = h5py.File('./train/digitStruct.mat'), it works fine!

Update

Here is some sample code to use the above two functions

mat_data = h5py.File(os.path.join(folder, 'digitStruct.mat'))
size = mat_data['/digitStruct/name'].size

for _i in tqdm.tqdm(range(size)):
    pic = get_name(_i, mat_data)
    box = get_box_data(_i, mat_data)

The above function shows how to get the name and the bbox data of each entry of the data!

like image 197
GoingMyWay Avatar answered Oct 05 '22 06:10

GoingMyWay