HDF5 rowmajor or colmajor

Tags:

hdf5

Is it possible to know if a matrix stored in HDF5 format is in RowMajor or ColMajor? For example when I save matrices from octave, which stores them internally as ColMajor, I need to transpose them when I read them in my C code where matrices are stored in RowMajor, and vice versa.

516

asked Jun 09 '14 08:06

remi

2 Answers

HDF5 stores data in row major order:

HDF5 uses C storage conventions, assuming that the last listed dimension is the fastest-changing dimension and the first-listed dimension is the slowest changing.

from the HDF5 User's Guide.

However, if you're using Octave's built-in HDF5 interface, it will automatically transpose the arrays for you. In general, how the data is actually written in the HDF5 file should be completely opaque to the end-user, and the interface should deal with differences in array ordering, etc.

162

answered Sep 17 '22 03:09

Yossarian

As @Yossarian pointed out. HDF5 always stores data as row-major (C convention). Octave is the same as Fortran and internally stores data as column-major.

When writing a matrix from Octave, the HDF5 layer does the transpose for you, so it is always written as row-major no matter what language you use. This provides the portability of the file.

There is a very good example in the HDF5 User's Guide section 7.3.2.5, as mentioned by @Yossarian. Here's the example (almost) reproduced using Octave:

octave:1> A = [ 1:3; 4:6 ]
A =

   1   2   3
   4   5   6

octave:2> save("-hdf5", "test.h5", "A")
octave:3> quit

~$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   COMMENT "# Created by Octave 3.6.4, Fri Jun 13 08:36:16 2014 MDT <user@localhost>"
   GROUP "A" {
      ATTRIBUTE "OCTAVE_NEW_FORMAT" {
         DATATYPE  H5T_STD_U8LE
         DATASPACE  SCALAR
         DATA {
         (0): 1
         }
      }
      DATASET "type" {
         DATATYPE  H5T_STRING {
            STRSIZE 7;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "matrix"
         }
      }
      DATASET "value" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 3, 2 ) / ( 3, 2 ) }
         DATA {
         (0,0): 1, 4,
         (1,0): 2, 5,
         (2,0): 3, 6
         }
      }
   }
}
}

Notice how the HDF5 layer has transposed the matrix to make sure it is stored in row-major format.

Then an example of reading it in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <hdf5.h>

#define FILE "test.h5"
#define DS   "A/value"

int
main(int argc, char **argv)
{
        int i = 0;
        int j = 0;
        int n = 0;
        int x = 0;
        int rank = 0;
        hid_t file_id;
        hid_t space_id;
        hid_t dset_id;
        herr_t stat;
        hsize_t *dims = NULL;
        int *data = NULL;

        file_id  = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
        dset_id  = H5Dopen(file_id, DS, dset_id);

        space_id = H5Dget_space(dset_id);
        n    = H5Sget_simple_extent_npoints(space_id);
        rank = H5Sget_simple_extent_ndims(space_id);

        dims = malloc(rank*sizeof(int));
        stat = H5Sget_simple_extent_dims(space_id, dims, NULL);

        printf("rank: %d\t dimensions: ", rank);
        for (i = 0; i < rank; ++i) {
                if (i == 0) {
                        printf("(");
                }
                printf("%llu", dims[i]);
                if (i == (rank -1)) {
                        printf(")\n");
                } else {
                        printf(" x ");
                }
        }
        data = malloc(n*sizeof(int));
        memset(data, 0, n*sizeof(int));
        stat  = H5Dread(dset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                         data);


        printf("%s:\n", DS);
        for (i = 0; i < dims[0]; ++i) {
                printf(" [ ");
                for (j = 0; j < dims[1]; ++j) {
                        x = i * dims[1] + j;
                        printf("%d ", data[x]);
                }
                printf("]\n");
        }

        stat  = H5Sclose(space_id);
        stat  = H5Dclose(dset_id);
        stat  = H5Fclose(file_id);


        return(EXIT_SUCCESS);
}

When compiled and run gives:

~$ h5cc -o rmat rmat.c
~$ ./rmat
rank: 2  dimensions: (3 x 2)
A/value:
 [ 1 4 ]
 [ 2 5 ]
 [ 3 6 ]

This is great as it means the matrices are stored optimized in memory. What it does mean though is that you have to change how you do your calculations. For row-major you need to do pre-multiplication, while for column-major you should be doing post-multiplication. Here is an example, hopefully it is explained a bit clearer.

Does this help?

answered Sep 20 '22 03:09

Timothy Brown

Related questions
                            
                                How to install h5py (needed for Keras) on MacOS with M1?
                            
                                ValueError: No dataset in HDF5 file with pandas.read_hdf from a MatLab h5 file [duplicate]
                            
                                Choosing a framework for larger than memory data analysis with python
                            
                                Problem renaming all HDF5 datasets in group for large hdf5 files
                            
                                Why is it faster to read whole hdf5 dataset than a slice
                            
                                HDF5 for Python: high level vs low level interfaces. h5py
                            
                                What is a good storage candidate for soft-realtime data acquisition under Linux?
                            
                                HDF5 file (h5py) with version control - hash changes on every save
                            
                                HDF5 file grows in size after overwriting the pandas dataframe
                            
                                `pip install tables` fail with ERROR:: Could not find a local HDF5 installation
                            
                                Compression of existing file using h5py
                            
                                Pytables/Pandas : Combining (reading?) mutliple HDF5 stores split by rows
                            
                                How to read HDF5 files that have only datasets (no groups) using h5py?
                            
                                mystery when storing a dataframe containing strings in HDF with pandas
                            
                                Add raster image to HDF5 file using h5py
                            
                                H5py store list of list of strings
                            
                                HDF5 - C++ - open a file to read the contents failed
                            
                                read specific columns from hdf5 file and pass conditions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With