I've just discovered the HDF5 format and I'm considering using it to store 3D data spread over a cluster of Java application servers. I have found out that there are several implementations available for Java, and would like to know the differences between them:
Java HD5 Interface (JHI5) The Java wrapper from the HDF group itself.
JHDF5 (HDF5 for Java)
Permafrost
Nujan: Pure Java NetCDF4 and HDF5 writer (cannot read HDF5)
Most importantly, I would like to know:
How much of the native API is covered, any limitations that do not exist in the native API?
If there is support for "Parallel HDF5"?
Once my 3D data is loaded, do I get a "native call overhead" each time I access one element in a 3D array? That is, do the data actually gets turned into Java objects, or stay in "native/JNI memory"?
Is there any know stability problems with a particular implementation, since a crash in native code normally takes the whole JVM down?
The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. HDF5 uses a "file directory" like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer.
hdf5 file on your computer. Open this file in HDFView. If you click on the name of the HDF5 file in the left hand window of HDFView, you can view metadata for the file. This will be located in the bottom window of the application.
HDF5 is a specification and format for creating hierarchical data from very large data sources. In HDF5 the data is organized in a file. The file object acts as the / (root) group of the hierarchy. Similar to the UNIX file system, in HDF5 the datasets and their groups are organized as an inverted tree.
HDF Java follows a layered approach:
JHI5 - the low level JNI wrappers: very flexible, but also quite tedious to use.
Java HDF object package - a high-level interface based on JHI5.
HDFView - a Java-based viewer application based on the Java HDF object package.
JHDF5 provides a high-level interface building on the JHI5 layer which provides most of the functionality of HDF5 to Java. The API has a shallow learning curve and hides most of the house-keeping work from the developer. You can run the Java HDF object package (and HDFView) on the JHI5 interface that is part of JHDF5, so the two APIs can co-exist within one Java program.
Permafrost and Nujan seem far from being complete at this point and Permafrost hasn't seen a lot of activity recently, so they appear to be not the first choice at this point in time.
I think a good path for you is to have a look at both the Java HDF5 object package and JHDF5, decide which one of the two APIs fit your needs better and go with that one.
Disclaimer: I have worked on the JHDF5 interface, so I may be biased.
Just wanted to point out another option, jhdf.io it's a pure Java library for HDF5. Currently it is read only and doesn't cover the full HDF5 specification. However it can open and read lots of HDF5 files, and I hope to improve it over time. Being pure Java it is much easier to integrate into other Java projects than other options and avoids the issues associated with JNI.
Disclaimer: I am the author of the jhdf library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With