Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark SQL BLOB datatype

While programming implementation with Apache Spark, I faced the problem processing a table with BLOB datatype.

document_id | content
          2   0x123sa..
        ......

org.apache.spark.sql.Row provides support for different sql datatypes, but i've not found a BLOB type:

sqlContext.sql("SELECT * FROM DOCUMENTS").map(row -> {
   String documentName = row.getString(0);
   Blob documentContents = row.???
   ....
}

How do I solve the problem?

like image 720
ovnia Avatar asked May 21 '26 08:05

ovnia


1 Answers

I'd call printSchema() on the SchemaRDD (Spark 1.2.0 or earlier) or DataFrame (Spark 1.3.0) returned by the sql() call to check for sure what you're getting -- a good technique to use whenever you're confused about the schema. (It's up to the implementation of the database connector to decide how to map the type.) The most likely option is BinaryType, which would look like:

root
 |-- document_id string (nullable = ...)
 |-- content binary (nullable = ...)

In which case you should be able to extract it using

row.getAs[Array[Byte]](1) 
like image 122
Spiro Michaylov Avatar answered May 22 '26 23:05

Spiro Michaylov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!