Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, NetCDF4 and HDF5

I don't know why these packages are always such a pain to install. I have been using NetCDF/HDF5 for a long time now and it's always been a pure horror trip getting them to install or run properly, no matter if it's on Linux or OSX, no matter if C, C++ or now python. The simple dependency between netcdf4 and hdf5 is a source of great pain for many people and I really wish the developers of those packages would finally do something about it.

So, the latest concrete problem I am facing is this: I am trying to install netCDF4 for python. I get the following error:

Package hdf5 was not found in the pkg-config search path
Perhaps you should add the directory containing `hdf5.pc'

I tried to install the hdf5 packages using apt-get, including:

  • libhdf5-serial-dev
  • libhdf5-serial
  • libhdf5-7
  • python-h5py
  • libhdf5-dev
  • hdf5-tools
  • hdf5-helpers
  • libhdf5-7-dbg

Using pip, I tried:

pip install h5py

which failed miserably to resolve a dependency to Cython, which I then installed manually. After that it installed (apparently) but I can not find the file hdf5.pc anywhere.

I am pulling my hairs out here. Anyone know how to work around this problem?

like image 386
Jürgen Simon Avatar asked Mar 28 '16 15:03

Jürgen Simon


People also ask

What is the difference between HDF5 and netCDF?

NetCDF4 uses a subset of HDF5 features, and adds some new features. NetCDF4 reads/writes specially structured HDF5 files. Performance of HDF5 and NetCDF4 is highly similar including on supercomputers. The main idea behind NetCDF4 is a simpler API than HDF5, while maintaining the same performance.

Is netCDF HDF5?

The netCDF-4/HDF5 file format enables the expansion of the netCDF model, libraries, and machine-independent data format for geoscience data. Together the netCDF interfaces, libraries, and formats support the creation, access, and sharing of scientific data.

How do I import NetCDF4 into Python?

To install with anaconda (conda) simply type conda install netCDF4 . Alternatively, you can install with pip . To be sure your netCDF4 module is properly installed start an interactive session in the terminal (type python and press 'Enter'). Then import netCDF4 as nc .


1 Answers

When you mix distribution packages and self-built packages, you are increasing your chance of problems (as you are finding out).

Also, do you want h5py or do you want netcdf-python? I don't think netcdf-python has a dependency on h5py. Rather, netcdf-python binds to the C netcdf library, which in turn depends on the C HDF5 library.

h5py likewise binds to C HDF5

There is a lot of software involved, it's true. Work your way through step by step and it will make more sense eventually (says the guy who has been doing this for 15 years... it gets easier!)

  1. If you are going to do any parallel programming, you'll need an MPI implementation
  2. HDF5 now provides the foundation for NetCDF4. If you want parallel programming, build HDF5 against your MPI implementation.
  3. Install the C library of NetCDF4
  4. now the python bindings can pick up what they need from NetCDF4, HDF5, and MPI

Yes it is a lot of software to configure and build. pkg-config can help a lot here! When you see Package hdf5 was not found in the pkg-config search path, that means you should adjust your PKG_CONFIG_DIR to point to the location of the package-config files. Unfortunately, hdf5 doesn't provide a .pc (package-config) file, so you'll have to just do that part by hand. Oh, and netcdf doesn't provide a pkg-config either: it provides a script nc-config that netcdf-python will use.

Let me provide a concrete example:

  • MPICH-master installed in /home/robl/soft/mpich-master
  • HDF5 installed in /home/robl/soft/hdf5-1.8.16
    • e.g configured like ../../hdf5-1.8.16/configure --prefix=/home/robl/work/soft/hdf5-1.8.16 CC=/home/robl/work/soft/mpich/bin/mpicc --enable-parallel
  • NetCDF4 installed in /home/robl/soft/netcdf-master
    • e.g. configured like ./configure CC=${HOME}/work/soft/mpich/bin/mpicc --prefix=${HOME}/work/soft/netcdf-master CPPFLAGS=-I${HOME}/work/soft/hdf5-1.8.16/include LDFLAGS=-L${HOME}/work/soft/hdf5-1.8.16/lib

now you have all the pre-requisietes for netcdf-python

by the way, http://unidata.github.io/netcdf4-python/ lays out the prerequisites and the necessary configure options

Don't get hung up on the carping about hdf5.pc. If you have nc-config in your path, it will provide the needed information.

If you are building for parallel programming, set CC to your MPI compiler. if not, you can skip the ``export CC=...'' step:

cd netcdf-python
export CC=${HOME}/work/soft/mpich/bin/mpicc
export PATH=${HOME}/work/soft/netcdf-master/bin:${PATH}
python setup.py build
like image 55
Rob Latham Avatar answered Sep 23 '22 04:09

Rob Latham