I use the Python package h5py (version 2.5.0) to access my hdf5 files. I want to traverse the content of a file and do something with every dataset. Using the <code>visit</code> method: <pre class="prettyprint"><code>import h5py def print_it(name): dset = f[name] print(dset) print(type(dset)) with h5py.File('test.hdf5', 'r') as f: f.visit(print_it) </code></pre> for a test file I obtain: <pre class="prettyprint"><code><HDF5 group "/x" (1 members)> <class 'h5py._hl.group.Group'> <HDF5 dataset "y": shape (100, 100, 100), type "<f8"> <class 'h5py._hl.dataset.Dataset'> </code></pre> which tells me that there is a dataset and a group in the file. However there is no obvious way except for using <code>type()</code> to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves. I would like to have something like: <pre class="prettyprint"><code>f = h5py.File(..) for key in f.keys(): x = f[key] print(x.is_group(), x.is_dataset()) # does not exist </code></pre> How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with <code>is_dataset = isinstance(item, h5py.Dataset)</code>. To list all the content of the file (except the file's attributes though) you can use <code>Group.visititems</code> with a callable which takes the name and instance of a item.

While the answers by Gall and James Smith are indicating the solution in general, the traversal through the hierachical HDF structure and filtering of all datasets still needed to be done. I did it using <code>yield from</code> which is available in Python 3.3+ which works quite nicely and present it here. <pre class="prettyprint"><code>import h5py def h5py_dataset_iterator(g, prefix=''): for key, item in g.items(): path = '{}/{}'.format(prefix, key) if isinstance(item, h5py.Dataset): # test for dataset yield (path, item) elif isinstance(item, h5py.Group): # test for group (go down) yield from h5py_dataset_iterator(item, path) with h5py.File('test.hdf5', 'r') as f: for (path, dset) in h5py_dataset_iterator(f): print(path, dset) </code></pre>

Because h5py uses python dictionaries as its method-of-choice for interaction, you need to use the "values()" function to actually access the items. So you may be able to use list filters: <pre class="prettyprint"><code>datasets = [item for item in f["Data"].values() if isinstance(item, h5py.Dataset)] </code></pre> Doing this recursively should be simple enough.

I prefer this solution. It finds the list of all objects in the hdf5 file "h5file", then sorts them based on class, similar to what has been mentioned before but not in such a succinct way: <pre class="prettyprint"><code>import h5py fh5 = h5py.File(h5file,'r') fh5.visit(all_h5_objs.append) all_groups = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Group) ] all_datasets = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Dataset) ] </code></pre>

How to differentiate between HDF5 datasets and groups with h5py?

Tags:

python

hdf5

h5py

I use the Python package h5py (version 2.5.0) to access my hdf5 files.

I want to traverse the content of a file and do something with every dataset.

Using the visit method:

import h5py

def print_it(name):
    dset = f[name]
    print(dset)
    print(type(dset))


with h5py.File('test.hdf5', 'r') as f:
    f.visit(print_it)

for a test file I obtain:

<HDF5 group "/x" (1 members)>
<class 'h5py._hl.group.Group'>
<HDF5 dataset "y": shape (100, 100, 100), type "<f8">
<class 'h5py._hl.dataset.Dataset'>

which tells me that there is a dataset and a group in the file. However there is no obvious way except for using type() to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves.

I would like to have something like:

f = h5py.File(..)
for key in f.keys():
    x = f[key]
    print(x.is_group(), x.is_dataset()) # does not exist

How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

416

asked Dec 17 '15 08:12

Trilarion

5 Answers

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with is_dataset = isinstance(item, h5py.Dataset).

To list all the content of the file (except the file's attributes though) you can use Group.visititems with a callable which takes the name and instance of a item.

answered Oct 06 '22 06:10

Gall

While the answers by Gall and James Smith are indicating the solution in general, the traversal through the hierachical HDF structure and filtering of all datasets still needed to be done. I did it using yield from which is available in Python 3.3+ which works quite nicely and present it here.

import h5py

def h5py_dataset_iterator(g, prefix=''):
    for key, item in g.items():
        path = '{}/{}'.format(prefix, key)
        if isinstance(item, h5py.Dataset): # test for dataset
            yield (path, item)
        elif isinstance(item, h5py.Group): # test for group (go down)
            yield from h5py_dataset_iterator(item, path)

with h5py.File('test.hdf5', 'r') as f:
    for (path, dset) in h5py_dataset_iterator(f):
        print(path, dset)

answered Oct 06 '22 05:10

Trilarion

For example, if you want to print the structure of a HDF5 file you can use the following code:

def h5printR(item, leading = ''):
    for key in item:
        if isinstance(item[key], h5py.Dataset):
            print(leading + key + ': ' + str(item[key].shape))
        else:
            print(leading + key)
            h5printR(item[key], leading + '  ')

# Print structure of a `.h5` file            
def h5print(filename):
    with h5py.File(filename, 'r') as h:
        print(filename)
        h5printR(h, '  ')

Example

>>> h5print('/path/to/file.h5')

file.h5
  test
    repeats
      cell01: (2, 300)
      cell02: (2, 300)
      cell03: (2, 300)
      cell04: (2, 300)
      cell05: (2, 300)
    response
      firing_rate_10ms: (28, 30011)
    stimulus: (300, 50, 50)
    time: (300,)

answered Oct 06 '22 06:10

Yas

Because h5py uses python dictionaries as its method-of-choice for interaction, you need to use the "values()" function to actually access the items. So you may be able to use list filters:

datasets = [item for item in f["Data"].values() if isinstance(item, h5py.Dataset)]

Doing this recursively should be simple enough.

answered Oct 06 '22 07:10

James Smith

I prefer this solution. It finds the list of all objects in the hdf5 file "h5file", then sorts them based on class, similar to what has been mentioned before but not in such a succinct way:

import h5py
fh5 = h5py.File(h5file,'r')
fh5.visit(all_h5_objs.append)
all_groups   = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Group) ]
all_datasets = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Dataset) ]

answered Oct 06 '22 06:10

Scott N

Related questions
                            
                                Python Global Variable not updating
                            
                                CPython memory allocation
                            
                                How can I preserve <br> as newlines with lxml.html text_content() or equivalent?
                            
                                Python: frame parameter of signal handler
                            
                                Python sphinx autosummary error
                            
                                Conda: installing local development package into single conda environment
                            
                                cannot perform reduce with flexible type plt.hist
                            
                                Change column values to column headers in pandas
                            
                                Why isn't IEnumerable consumed?/how do generators work in c# compared to python
                            
                                TypeError: argument of type 'NoneType' is not iterable
                            
                                Python Tkinter - resize widgets evenly in a window
                            
                                How to find the difference between two lists of dictionaries?
                            
                                Python PIP has issues with path for MS Visual Studio 2010 Express for 64-bit install on Windows 7
                            
                                Python silent print PDF to specific printer
                            
                                How to import and use user defined classes in robot framework with python
                            
                                Pandas - Python, deleting rows based on Date column
                            
                                Best way to interpolate a numpy.ndarray along an axis
                            
                                AttributeError: 'Flask' object has no attribute 'login_manager' -- Login_Manager
                            
                                Python 3 Enums with Function Values
                            
                                Subset of columns and filter Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to differentiate between HDF5 datasets and groups with h5py?

Tags:

python

hdf5

h5py

Trilarion

People also ask

5 Answers

Gall

Trilarion

Example

Yas

James Smith

Scott N

Recent Activity

Donate For Us