I have a file in hdf5
format. I know that it is supposed to be a matrix, but I want to read that matrix in R
so that I can study it. I see that there is a h5r
package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5
object with this package, and how to actually extract the matrix?
UPDATE
I found out a package rhdf5
which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5
file as a python pickle
. So every time I tried to open it and access it through R
i got a segmentation fault
. I did figure out how to save the matrix from within python
as a tsv
file and now that problem is solved.
The rhdf5
package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per # https://bioconductor.org/install/ if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
And inspect the structure:
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With