Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark 2.0 Dataset Encoder with trait

I'm building a Dataset in which each record is mapped to a case class (e.g. CustomDataEntry with primitive types).

val dataset = spark.read (...) .as[CustomDataEntry]

So far so good

Now I'm writing a transformer that takes a Dataset with CustomDataEntry's, do some computations and add some new columns eg. find the latitude and longitude and calculate a geohash

My CustomDataEntry now has a property/column (geohash) which is not present in the case class, but is present in the Dataset. Again this works fine, but seems not good and not type safe (if this is even possible with Encoders).

I could add this as an Option field in my case class, but that seems messy, not composabl. A far better way seems that I should mixin some trait on the CustomDataEntry

e.g.

trait Geo{
    val geohash:String
}

and then return the dataset as

dataset.as[CustomDataEntry with Geo]

This will not work

Error:(21, 10) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. .as[CustomDataEntry with Geo]

The answer seems obvious (not supported, future versions), but maybe I'm overlooking something?

like image 550
Tom Lous Avatar asked Oct 13 '16 09:10

Tom Lous


1 Answers

Encoders are not there yet IMHO, but you can use Encoders.kryo[CustomDataEntry with Geo] as an Encoder workaround.

like image 90
Viacheslav Rodionov Avatar answered Nov 07 '22 22:11

Viacheslav Rodionov