I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py
, but I cannot figure out how to access data within the file.
import h5py
import numpy as np
f1 = h5py.File(file_name,'r+')
This works and the file is read. But how can I access data inside the file object f1
?
To use HDF5, numpy needs to be imported. One important feature is that it can attach metaset to every data in the file thus provides powerful searching and accessing. Let's get started with installing HDF5 to the computer. As HDF5 works on numpy, we would need numpy installed in our machine too.
Reading HDF5 files To open and read data we use the same File method in read mode, r. To see what data is in this file, we can call the keys() method on the file object. We can then grab each dataset we created above using the get method, specifying the name. This returns a HDF5 dataset object.
Double clicking on an . hdf5 file in the file browser will open it in a special HDF browser. You can then browse through the groups and open the datasets in the . hdf5 file.
import h5py
filename = "file.hdf5"
with h5py.File(filename, "r") as f:
# Print all root level object names (aka keys)
# these can be group or dataset names
print("Keys: %s" % f.keys())
# get first object name/key; may or may NOT be a group
a_group_key = list(f.keys())[0]
# get the object type for a_group_key: usually group or dataset
print(type(f[a_group_key]))
# If a_group_key is a group name,
# this gets the object names in the group and returns as a list
data = list(f[a_group_key])
# If a_group_key is a dataset name,
# this gets the dataset values and returns as a list
data = list(f[a_group_key])
# preferred methods to get dataset values:
ds_obj = f[a_group_key] # returns as a h5py dataset object
ds_arr = f[a_group_key][()] # returns as a numpy array
import h5py
# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))
# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
data_file.create_dataset("dataset_name", data=data_matrix)
See h5py docs for more information.
For your application, the following might be important:
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Reading the file
import h5py
f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
print(type(f[key])) # get the object type: usually group or dataset
Extracting the data
#Get the HDF5 group; key needs to be a group name from above
group = f[key]
#Checkout what keys are inside that group.
for key in group.keys():
print(key)
# This assumes group[some_key_inside_the_group] is a dataset,
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data
#After you are done
f.close()
you can use Pandas.
import pandas as pd
pd.read_hdf(filename,key)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With