Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append a string to HDF5 dataset with C++?

Tags:

c++

hdf5

I'd like to append a string to an HDF5 dataset of dimension 1. The following code works for appending doubles to the "doubles" dataset in file test-doubles.h5 but the code segfaults in the dataset.write(str, string_type, mspace, fspace) call:

#include "H5Cpp.h"
const int RANK = 1;
H5::StrType string_type(H5::PredType::C_S1, H5T_VARIABLE);

void append_double(H5::DataSet &dataset, double value) {
    // dataspace
    hsize_t dims[RANK] = { 1 };
    hsize_t maxdims[RANK] = { H5S_UNLIMITED };
    H5::DataSpace mspace(RANK, dims, maxdims);

    H5::DataSpace space = dataset.getSpace();
    const hsize_t actual_dim = space.getSimpleExtentNpoints();

    // extend the dataset
    hsize_t new_size[RANK];
    new_size[0] = actual_dim + 1;
    dataset.extend(new_size);

    // select hyperslab.
    H5::DataSpace fspace = dataset.getSpace();
    hsize_t offset[RANK] = { actual_dim };
    hsize_t dims1[RANK] = { 1 };
    fspace.selectHyperslab(H5S_SELECT_SET, dims1, offset);

    dataset.write(&value, H5::PredType::NATIVE_DOUBLE, mspace, fspace);
}

void append_string(H5::DataSet &dataset, string value) {
    // dataspace
    hsize_t dims[RANK] = { 1 };
    hsize_t maxdims[RANK] = { H5S_UNLIMITED };
    H5::DataSpace mspace(RANK, dims, maxdims);

    H5::DataSpace space = dataset.getSpace();
    const hsize_t actual_dim = space.getSimpleExtentNpoints();

    // extend the dataset
    hsize_t new_size[RANK];
    new_size[0] = actual_dim + 1;
    dataset.extend(new_size);

    // select hyperslab.
    H5::DataSpace fspace = dataset.getSpace();
    hsize_t offset[RANK] = { actual_dim };
    hsize_t dims1[RANK] = { 1 };
    fspace.selectHyperslab(H5S_SELECT_SET, dims1, offset);

    const char *str = value.c_str();
    dataset.write(str, string_type, mspace, fspace);

}

int main(int argc, char *argv[]) {
    cout << "start" << endl;
    {
        H5::H5File h5_file("test-doubles.h5", H5F_ACC_TRUNC);

        // create data space with unlimited dimensions for doubles
        hsize_t doubles_dims[RANK] = { 0 };
        hsize_t doubles_maxdims[RANK] = { H5S_UNLIMITED };
        H5::DataSpace doubles_fspace(RANK, doubles_dims, doubles_maxdims);

        // enable chunking for this dataset
        H5::DSetCreatPropList cparms;
        hsize_t chunk_dims[RANK] = { 1 };
        cparms.setChunk(RANK, chunk_dims);

        // create dataset for doubles:
        H5::DataSet d_dataset = h5_file.createDataSet("doubles",
        H5::PredType::NATIVE_DOUBLE, doubles_fspace, cparms);

        // append values to this dataset:
        append_double(d_dataset, 1.0);
        append_double(d_dataset, 2.0);
        append_double(d_dataset, 3.0);

        cout << "doubles written." << endl;
    }

    {
        H5::H5File h5_file("test-strings.h5", H5F_ACC_TRUNC);

        // create data space with unlimited dimensions for strings
        hsize_t str_dims[RANK] = { 0 };
        hsize_t str_maxdims[RANK] = { H5S_UNLIMITED };
        H5::DataSpace str_fspace(RANK, str_dims, str_maxdims);

        // enable chunking for this dataset
        H5::DSetCreatPropList str_cparms;
        hsize_t str_chunk_dims[RANK] = { 1 };
        str_cparms.setChunk(RANK, str_chunk_dims);

        // create dataset for doubles:
        H5::DataSet str_dataset = h5_file.createDataSet("strings", string_type, str_fspace, str_cparms);

        // append strings to this dataset:
        append_string(str_dataset, "test1");
        append_string(str_dataset, "test2");
        append_string(str_dataset, "test3");
        cout << "strings written." << endl;
    }

    cout << "all done." << endl;
    return 0;
}

Thanks a lot for your help!

like image 220
Frank Avatar asked Sep 02 '14 09:09

Frank


People also ask

How to read string data in HDF5 dataset?

String data in HDF5 datasets is read as bytes by default: bytes objects for variable-length strings, or numpy bytes arrays ( 'S' dtypes) for fixed-length strings. Use Dataset.asstr () to retrieve str objects. Variable-length strings in attributes are read as str objects. These are decoded as UTF-8 with surrogate escaping for unrecognised bytes.

What is an HDF5 attribute?

An HDF5 attribute is a small metadata object describing the nature and/or intended usage of a primary data object . A primary data object may be a dataset, group, or committed datatype. Attributes are assumed to be very small as data objects go, so storing them as standard HDF5 datasets would be quite inefficient.

How do I write to an HDF5 file?

Write the contained data to an HDF5 file using HDFStore. Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

How do you encode HDF5 data in Python?

You can use string_dtype () to explicitly specify any HDF5 string datatype. When writing data to an existing dataset or attribute, data passed as bytes is written without checking the encoding. Data passed as Python str objects is encoded as either ASCII or UTF-8, based on the HDF5 datatype.


1 Answers

It all works if you replace

dataset.write(str, string_type, mspace, fspace);

with

dataset.write(&str, string_type, mspace, fspace);
like image 173
Walter Avatar answered Oct 16 '22 19:10

Walter