I am using the HDF5 C++ API to write 2D array dataset files. The HDF Group has an example to create a HDF5 file from a statically defined array size, which I've modified to suite my needs below. However, I require a dynamic array, where both NX
and NY
are determined at runtime. I've found another solution to create 2D arrays using the "new" keyword to help create a dynamic array. Here is what I have:
#include "StdAfx.h"
#include "H5Cpp.h"
using namespace H5;
const H5std_string FILE_NAME("C:\\SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;
int main (void)
{
// Create a 2D array using "new" method
double **data = new double*[NX];
for (int j = 0; j < NX; j++) // 0 1 2 3 4 5
{ // 1 2 3 4 5 6
data[j] = new double[NY]; // 2 3 4 5 6 7
for (int i = 0; i < NY; i++) // 3 4 5 6 7 8
data[j][i] = (float)(i + j); // 4 5 6 7 8 9
}
// Create HDF5 file and dataset
H5File file(FILE_NAME, H5F_ACC_TRUNC);
hsize_t dimsf[2] = {NX, NY};
DataSpace dataspace(2, dimsf);
DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE,
dataspace);
// Attempt to write data to HDF5 file
dataset.write(data, PredType::NATIVE_DOUBLE);
// Clean up
for(int j = 0; j < NX; j++)
delete [] data[j];
delete [] data;
return 0;
}
The resulting file, however, is not as expected (output from hdf5dump
):
HDF5 "SDS.h5" {
GROUP "/" {
DATASET "FloatArray" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
DATA {
(0,0): 4.76465e-307, 4.76541e-307, -7.84591e+298, -2.53017e-098, 0,
(0,5): 3.8981e-308,
(1,0): 4.76454e-307, 0, 2.122e-314, -7.84591e+298, 0, 1,
(2,0): 2, 3, 4, 5, -2.53017e-098, -2.65698e+303,
(3,0): 0, 3.89814e-308, 4.76492e-307, 0, 2.122e-314, -7.84591e+298,
(4,0): 1, 2, 3, 4, 5, 6
}
}
}
}
The problem stems back to how the 2D array was created (since this example works fine with a static array method). As I understand from this email thread:
The HDF5 library expects to a contiguous array of elements, not pointers to elements in lower dimensions
As I am rather new to C++/HDF5, I'm not sure how to create a dynamically sized array at runtime that is a contiguous array of elements. I do not want to do the more complicated "hyperslab" method described in the email thread, as this looks overly complicated. Any help is appreciated.
Well, I don't know anything about HDF5, but dynamic 2D arrays in C++ with a contiguous buffer can be simulated by using a 1D array of size NX * NY
. For example:
Allocation:
double *data = new double[NX*NY];
Element access:
data[j*NY + i]
(instead of data[j][i]
)
Here is how to write N dimension arrays in HDF5 format
It is much better to use the boost multi_array
class. This is the equivalent of using std::vector
rather than raw arrays: It does all the memory management for you and you can access elements as efficiently as raw arrays using familiar subscripting (e.g. data[12][13] = 46
)
Here is a short example:
#include <algorithm>
#include <boost/multi_array.hpp>
using boost::multi_array;
using boost::extents;
// dataset dimensions set at run time
int NX = 5, NY = 6, NZ = 7;
// allocate array using the "extents" helper.
// This makes it easier to see how big the array is
multi_array<double, 3> float_data(extents[NX][NY][NZ]);
// use resize to change size when necessary
// float_data.resize(extents[NX + 5][NY + 4][NZ + 3]);
// This is how you would fill the entire array with a value (e.g. 3.0)
std::fill_n(float_data.data(), float_data.num_elements(), 3.0)
// initialise the array to some variables
for (int ii = 0; ii != NX; ii++)
for (int jj = 0; jj != NY; jj++)
for (int kk = 0; kk != NZ; kk++)
float_data[ii][jj][kk] = ii + jj + kk
// write to HDF5 format
H5::H5File file("SDS.h5", H5F_ACC_TRUNC);
write_hdf5(file, "doubleArray", float_data );
The last line calls a function which can write multi_array
s of any dimension and any standard number type (ints
, chars
, floats
etc).
Here is code for write_hdf5()
.
First, we must map c++ types to HDF5 types (from the H5
c++ api):
#include <cstdint>
//!_______________________________________________________________________________________
//!
//! map types to HDF5 types
//!
//!
//! \author lg (04 March 2013)
//!_______________________________________________________________________________________
template<typename T> struct get_hdf5_data_type
{ static H5::PredType type()
{
//static_assert(false, "Unknown HDF5 data type");
return H5::PredType::NATIVE_DOUBLE;
}
};
template<> struct get_hdf5_data_type<char> { H5::IntType type { H5::PredType::NATIVE_CHAR }; };
//template<> struct get_hdf5_data_type<unsigned char> { H5::IntType type { H5::PredType::NATIVE_UCHAR }; };
//template<> struct get_hdf5_data_type<short> { H5::IntType type { H5::PredType::NATIVE_SHORT }; };
//template<> struct get_hdf5_data_type<unsigned short> { H5::IntType type { H5::PredType::NATIVE_USHORT }; };
//template<> struct get_hdf5_data_type<int> { H5::IntType type { H5::PredType::NATIVE_INT }; };
//template<> struct get_hdf5_data_type<unsigned int> { H5::IntType type { H5::PredType::NATIVE_UINT }; };
//template<> struct get_hdf5_data_type<long> { H5::IntType type { H5::PredType::NATIVE_LONG }; };
//template<> struct get_hdf5_data_type<unsigned long> { H5::IntType type { H5::PredType::NATIVE_ULONG }; };
template<> struct get_hdf5_data_type<long long> { H5::IntType type { H5::PredType::NATIVE_LLONG }; };
template<> struct get_hdf5_data_type<unsigned long long> { H5::IntType type { H5::PredType::NATIVE_ULLONG }; };
template<> struct get_hdf5_data_type<int8_t> { H5::IntType type { H5::PredType::NATIVE_INT8 }; };
template<> struct get_hdf5_data_type<uint8_t> { H5::IntType type { H5::PredType::NATIVE_UINT8 }; };
template<> struct get_hdf5_data_type<int16_t> { H5::IntType type { H5::PredType::NATIVE_INT16 }; };
template<> struct get_hdf5_data_type<uint16_t> { H5::IntType type { H5::PredType::NATIVE_UINT16 }; };
template<> struct get_hdf5_data_type<int32_t> { H5::IntType type { H5::PredType::NATIVE_INT32 }; };
template<> struct get_hdf5_data_type<uint32_t> { H5::IntType type { H5::PredType::NATIVE_UINT32 }; };
template<> struct get_hdf5_data_type<int64_t> { H5::IntType type { H5::PredType::NATIVE_INT64 }; };
template<> struct get_hdf5_data_type<uint64_t> { H5::IntType type { H5::PredType::NATIVE_UINT64 }; };
template<> struct get_hdf5_data_type<float> { H5::FloatType type { H5::PredType::NATIVE_FLOAT }; };
template<> struct get_hdf5_data_type<double> { H5::FloatType type { H5::PredType::NATIVE_DOUBLE }; };
template<> struct get_hdf5_data_type<long double> { H5::FloatType type { H5::PredType::NATIVE_LDOUBLE }; };
Then we can use a bit of template forwarding magic to make a function of the right type to output our data. Since this is template code, it needs to live in a header file if you are going to output HDF5 arrays from multiple source files in your programme:
//!_______________________________________________________________________________________
//!
//! write_hdf5 multi_array
//!
//! \author leo Goodstadt (04 March 2013)
//!
//!_______________________________________________________________________________________
template<typename T, std::size_t DIMENSIONS, typename hdf5_data_type>
void do_write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data, hdf5_data_type& datatype)
{
// Little endian for x86
//FloatType datatype(get_hdf5_data_type<T>::type());
datatype.setOrder(H5T_ORDER_LE);
vector<hsize_t> dimensions(data.shape(), data.shape() + DIMENSIONS);
H5::DataSpace dataspace(DIMENSIONS, dimensions.data());
H5::DataSet dataset = file.createDataSet(data_set_name, datatype, dataspace);
dataset.write(data.data(), datatype);
}
template<typename T, std::size_t DIMENSIONS>
void write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data )
{
get_hdf5_data_type<T> hdf_data_type;
do_write_hdf5(file, data_set_name, data, hdf_data_type.type);
}
In scientific programming it's common to represent multidimensional arrays as a big 1D array and then calculating the corresponding offset from the multidimensional indices, e.g. as seen in the answer by Doc Brown.
Alternatively, you can overload the subscript operator (operator[]()
) in order to provide an interface that allows the use of multi-dimensional indices backed by the 1D array. Or better yet, use a library which does this, such as Boost multi_array. Or in case your 2D arrays are matrices, you can use a nice C++ linear algebra library such as Eigen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With