Reading HDF5 files [closed]

Question

Is there a way to read HDF5 files using the Scala version of Spark?

It looks like it can be done in Python (via Pyspark), but I can't find anything for Scala.

Timothy Perrigo · Accepted Answer

There isn't a Hadoop InputFormatimplementation for HDF5 because it is not capable of being arbitrarily split:

Breaking the container into blocks is a bit like taking an axe and chopping it to pieces, severing blindly the content and the smart wiring in the process. The result is a mess, because there's no alignment or correlation between HDFS block boundaries and the internal HDF5 cargo layout or container support structure. Reference

The same site discusses the possibility of transforming HDF5 files to Avro files, thus enabling them to be read by Hadoop/Spark, but the PySpark example you alluded to is probably a simpler way to go, but as the linked document mentions, there are a number of technical challenges that need to be addressed to efficiently and effectively work with HDF5 documents in Hadoop/Spark.

Reading HDF5 files [closed]

Tags:

hdf5

scala

apache-spark

John

1 Answers

Timothy Perrigo

Recent Activity

Donate For Us

Reading HDF5 files [closed]

Tags:

hdf5

scala

apache-spark

John

1 Answers

Timothy Perrigo

Related questions

Recent Activity

Donate For Us