I'm building a Dataset in which each record is mapped to a case class (e.g. CustomDataEntry with primitive types).
val dataset = spark.read (...) .as[CustomDataEntry]
So far so good
Now I'm writing a transformer that takes a Dataset with CustomDataEntry's, do some computations and add some new columns eg. find the latitude and longitude and calculate a geohash
My CustomDataEntry now has a property/column (geohash) which is not present in the case class, but is present in the Dataset. Again this works fine, but seems not good and not type safe
(if this is even possible with Encoders).
I could add this as an Option field in my case class, but that seems messy, not composabl. A far better way seems that I should mixin some trait on the CustomDataEntry
e.g.
trait Geo{
val geohash:String
}
and then return the dataset as
dataset.as[CustomDataEntry with Geo]
This will not work
Error:(21, 10) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. .as[CustomDataEntry with Geo]
The answer seems obvious (not supported, future versions), but maybe I'm overlooking something?
Encoders are not there yet IMHO, but you can use Encoders.kryo[CustomDataEntry with Geo]
as an Encoder workaround.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With