While programming implementation with Apache Spark, I faced the problem processing a table with BLOB datatype.
document_id | content
2 0x123sa..
......
org.apache.spark.sql.Row provides support for different sql datatypes, but i've not found a BLOB type:
sqlContext.sql("SELECT * FROM DOCUMENTS").map(row -> {
String documentName = row.getString(0);
Blob documentContents = row.???
....
}
How do I solve the problem?
I'd call printSchema() on the SchemaRDD (Spark 1.2.0 or earlier) or DataFrame (Spark 1.3.0) returned by the sql() call to check for sure what you're getting -- a good technique to use whenever you're confused about the schema. (It's up to the implementation of the database connector to decide how to map the type.) The most likely option is BinaryType, which would look like:
root
|-- document_id string (nullable = ...)
|-- content binary (nullable = ...)
In which case you should be able to extract it using
row.getAs[Array[Byte]](1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With