Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you view hdf5 files in pycharm?

Is there a way / plugin to view hdf5 files in pycharm so that you don't have to install HDFVIEW to manually check a file?

like image 242
Claudiu Creanga Avatar asked Oct 16 '22 19:10

Claudiu Creanga


1 Answers

You can use the h5py library.

You may not know the structure of your HDF5 file in advance. If this is the case, you can use a function to iterate all paths within your HDF5 file. Here's an example:

def traverse_datasets(hdf_file):

    """Traverse all datasets across all groups in HDF5 file."""

    import h5py

    def h5py_dataset_iterator(g, prefix=''):
        for key in g.keys():
            item = g[key]
            path = '{}/{}'.format(prefix, key)
            if isinstance(item, h5py.Dataset): # test for dataset
                yield (path, item)
            elif isinstance(item, h5py.Group): # test for group (go down)
                yield from h5py_dataset_iterator(item, path)

    with h5py.File(hdf_file, 'r') as f:
        for (path, dset) in h5py_dataset_iterator(f):
            print(path, dset)

    return None

Example usage:

traverse_datasets('file.h5')

/DataSet1 <HDF5 dataset "DataSet1": shape (655559, 260), type "<f4">
/DataSet2 <HDF5 dataset "DataSet2": shape (22076, 10000), type "<f4">
/index <HDF5 dataset "index": shape (677635,), type "|V384">

Then to read a particular dataset, you can pick a path:

with h5pyFile('file.h5', 'r') as f:
    arr = f['/DataSet1'][:]  # read entire dataset into memory

If your data cannot be held in memory, you can either print iteratively or extract a slice into memory. The h5py documentation has numerous examples. The syntax follows NumPy conventions.

like image 104
jpp Avatar answered Oct 20 '22 19:10

jpp