Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HDF5 for Python: high level vs low level interfaces. h5py

I've been working with HDF5 files with C and Matlab, both using the same way for reading from and writing to datasets:

  • open file with h5f
  • open dataset with h5d
  • select space with h5s

and so on...

But now I'm working with Python, and with its h5py library I see that it has two ways to manage HDF5: high-level and low-level interfaces. And with the former it takes less lines of code to get the information from a single variable of the file.

Is there any noticeable loss of performance when using the high-level interface?
For example when dealing with a file with many variables inside, and we must read just one of them.

like image 780
Nicolás Ozimica Avatar asked Nov 11 '11 16:11

Nicolás Ozimica


People also ask

Why is HDF5 file so large?

This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.

What is h5py file in Python?

An HDF5 file is a container for two kinds of objects: datasets , which are array-like collections of data, and groups , which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays.

When should I use HDF5?

Supports Large, Complex Data: HDF5 is a compressed format that is designed to support large, heterogeneous, and complex datasets. Supports Data Slicing: "Data slicing", or extracting portions of the dataset as needed for analysis, means large files don't need to be completely read into the computers memory or RAM.

Can HDF5 store strings?

Encodings. HDF5 supports two string encodings: ASCII and UTF-8.


1 Answers

High-level interfaces are generally going with a performance loss of some sort. After that, whether it is noticeable (worth being investigated) will depend on what you are doing exactly with your code.

Just start with the high-level interface. If the code is overall too slow, start profiling and move the bottlenecks down to the lower-level interface and see if it helps.

like image 152
lgautier Avatar answered Nov 10 '22 12:11

lgautier