Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read HDF5 files in Python

Tags:

python

hdf5

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

My code

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')    

This works and the file is read. But how can I access data inside the file object f1?

like image 494
Sameer Damir Avatar asked Jan 27 '15 12:01

Sameer Damir


People also ask

How do I open an HDF5 file in Python?

To use HDF5, numpy needs to be imported. One important feature is that it can attach metaset to every data in the file thus provides powerful searching and accessing. Let's get started with installing HDF5 to the computer. As HDF5 works on numpy, we would need numpy installed in our machine too.

How do I read my HDF5 data?

Reading HDF5 files To open and read data we use the same File method in read mode, r. To see what data is in this file, we can call the keys() method on the file object. We can then grab each dataset we created above using the get method, specifying the name. This returns a HDF5 dataset object.

How do I open h5 files in Jupyter notebook?

Double clicking on an . hdf5 file in the file browser will open it in a special HDF browser. You can then browse through the groups and open the datasets in the . hdf5 file.


3 Answers

Read HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # Print all root level object names (aka keys) 
    # these can be group or dataset names 
    print("Keys: %s" % f.keys())
    # get first object name/key; may or may NOT be a group
    a_group_key = list(f.keys())[0]

    # get the object type for a_group_key: usually group or dataset
    print(type(f[a_group_key])) 

    # If a_group_key is a group name, 
    # this gets the object names in the group and returns as a list
    data = list(f[a_group_key])

    # If a_group_key is a dataset name, 
    # this gets the dataset values and returns as a list
    data = list(f[a_group_key])
    # preferred methods to get dataset values:
    ds_obj = f[a_group_key]      # returns as a h5py dataset object
    ds_arr = f[a_group_key][()]  # returns as a numpy array

Write HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("dataset_name", data=data_matrix)

See h5py docs for more information.

Alternatives

  • JSON: Nice for writing human-readable data; VERY commonly used (read & write)
  • CSV: Super simple format (read & write)
  • pickle: A Python serialization format (read & write)
  • MessagePack (Python package): More compact representation (read & write)
  • HDF5 (Python package): Nice for matrices (read & write)
  • XML: exists too *sigh* (read & write)

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

like image 136
Martin Thoma Avatar answered Oct 21 '22 17:10

Martin Thoma


Reading the file

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset

Extracting the data

#Get the HDF5 group; key needs to be a group name from above
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

# This assumes group[some_key_inside_the_group] is a dataset, 
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data

#After you are done
f.close()
like image 33
Daksh Avatar answered Oct 21 '22 18:10

Daksh


you can use Pandas.

import pandas as pd
pd.read_hdf(filename,key)
like image 24
Danny Avatar answered Oct 21 '22 16:10

Danny