Apache Spark SQL BLOB datatype

Question

While programming implementation with Apache Spark, I faced the problem processing a table with BLOB datatype.

document_id | content
          2   0x123sa..
        ......

org.apache.spark.sql.Row provides support for different sql datatypes, but i've not found a BLOB type:

sqlContext.sql("SELECT * FROM DOCUMENTS").map(row -> {
   String documentName = row.getString(0);
   Blob documentContents = row.???
   ....
}

How do I solve the problem?

Spiro Michaylov · Accepted Answer

I'd call printSchema() on the SchemaRDD (Spark 1.2.0 or earlier) or DataFrame (Spark 1.3.0) returned by the sql() call to check for sure what you're getting -- a good technique to use whenever you're confused about the schema. (It's up to the implementation of the database connector to decide how to map the type.) The most likely option is BinaryType, which would look like:

root
 |-- document_id string (nullable = ...)
 |-- content binary (nullable = ...)

In which case you should be able to extract it using

row.getAs[Array[Byte]](1)

root
 |-- document_id string (nullable = ...)
 |-- content binary (nullable = ...)

In which case you should be able to extract it using

row.getAs[Array[Byte]](1)

Apache Spark SQL BLOB datatype

Tags:

java

sql

mysql

apache-spark

ovnia

1 Answers

Spiro Michaylov

Recent Activity

Donate For Us

Apache Spark SQL BLOB datatype

Tags:

java

sql

mysql

apache-spark

ovnia

1 Answers

Spiro Michaylov

Related questions

Recent Activity

Donate For Us