Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to differentiate between HDF5 datasets and groups with h5py?

Tags:

python

hdf5

h5py

I use the Python package h5py (version 2.5.0) to access my hdf5 files.

I want to traverse the content of a file and do something with every dataset.

Using the visit method:

import h5py

def print_it(name):
    dset = f[name]
    print(dset)
    print(type(dset))


with h5py.File('test.hdf5', 'r') as f:
    f.visit(print_it)

for a test file I obtain:

<HDF5 group "/x" (1 members)>
<class 'h5py._hl.group.Group'>
<HDF5 dataset "y": shape (100, 100, 100), type "<f8">
<class 'h5py._hl.dataset.Dataset'>

which tells me that there is a dataset and a group in the file. However there is no obvious way except for using type() to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves.

I would like to have something like:

f = h5py.File(..)
for key in f.keys():
    x = f[key]
    print(x.is_group(), x.is_dataset()) # does not exist

How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

like image 416
Trilarion Avatar asked Dec 17 '15 08:12

Trilarion


People also ask

What is the difference between HDF5 group and HDF5 dataset?

Within one HDF5 file, you can store a similar set of data organized in the same way that you might organize files and folders on your computer. However in a HDF5 file, what we call "directories" or "folders" on our computers, are called groups and what we call files on our computer are called datasets .

What is an HDF5 group?

HDF5 is a specification and format for creating hierarchical data from very large data sources. In HDF5 the data is organized in a file. The file object acts as the / (root) group of the hierarchy. Similar to the UNIX file system, in HDF5 the datasets and their groups are organized as an inverted tree.

How do I check my HDF5 data?

Open a HDF5/H5 file in HDFView hdf5 file on your computer. Open this file in HDFView. If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file. This will be located in the bottom window of the application.

What is HDF5 dataset?

An HDF5 dataset is an object composed of a collection of data elements, or raw data, and metadata that stores a description of the data elements, data layout, and all other information necessary to write, read, and interpret the stored data.


5 Answers

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with is_dataset = isinstance(item, h5py.Dataset).

To list all the content of the file (except the file's attributes though) you can use Group.visititems with a callable which takes the name and instance of a item.

like image 58
Gall Avatar answered Oct 06 '22 06:10

Gall


While the answers by Gall and James Smith are indicating the solution in general, the traversal through the hierachical HDF structure and filtering of all datasets still needed to be done. I did it using yield from which is available in Python 3.3+ which works quite nicely and present it here.

import h5py

def h5py_dataset_iterator(g, prefix=''):
    for key, item in g.items():
        path = '{}/{}'.format(prefix, key)
        if isinstance(item, h5py.Dataset): # test for dataset
            yield (path, item)
        elif isinstance(item, h5py.Group): # test for group (go down)
            yield from h5py_dataset_iterator(item, path)

with h5py.File('test.hdf5', 'r') as f:
    for (path, dset) in h5py_dataset_iterator(f):
        print(path, dset)
like image 24
Trilarion Avatar answered Oct 06 '22 05:10

Trilarion


For example, if you want to print the structure of a HDF5 file you can use the following code:

def h5printR(item, leading = ''):
    for key in item:
        if isinstance(item[key], h5py.Dataset):
            print(leading + key + ': ' + str(item[key].shape))
        else:
            print(leading + key)
            h5printR(item[key], leading + '  ')

# Print structure of a `.h5` file            
def h5print(filename):
    with h5py.File(filename, 'r') as h:
        print(filename)
        h5printR(h, '  ')

Example

>>> h5print('/path/to/file.h5')

file.h5
  test
    repeats
      cell01: (2, 300)
      cell02: (2, 300)
      cell03: (2, 300)
      cell04: (2, 300)
      cell05: (2, 300)
    response
      firing_rate_10ms: (28, 30011)
    stimulus: (300, 50, 50)
    time: (300,)
like image 31
Yas Avatar answered Oct 06 '22 06:10

Yas


Because h5py uses python dictionaries as its method-of-choice for interaction, you need to use the "values()" function to actually access the items. So you may be able to use list filters:

datasets = [item for item in f["Data"].values() if isinstance(item, h5py.Dataset)]

Doing this recursively should be simple enough.

like image 20
James Smith Avatar answered Oct 06 '22 07:10

James Smith


I prefer this solution. It finds the list of all objects in the hdf5 file "h5file", then sorts them based on class, similar to what has been mentioned before but not in such a succinct way:

import h5py
fh5 = h5py.File(h5file,'r')
fh5.visit(all_h5_objs.append)
all_groups   = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Group) ]
all_datasets = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Dataset) ]
like image 30
Scott N Avatar answered Oct 06 '22 06:10

Scott N