I am writing a custom spark.ml transformer by extending Transformer.
Everything is fine, however I am not able to save this instance of this transformer since it does not extend from DefaultParamsWritable
trait as does all transformers, and I cannot directly mixin DefaultParamsWritable
trait either as it is package specific for org.apache.spark.ml
.
One workaround to this is to put your class under org.apache.spark.ml
. Is this the only way to achieve this? Any better solutions?
Finally found a way of doing this!
So the trick has two steps.
If you plan on coding a transformer that has some variables that need to be written when saved, then it needs to be a a trait that extends org.apache.spark.ml.param.Params class.
The common traits like HasInputCol are private to the spark ml package so you need to reimplement those as well in a public util package of your own choice. (There is a bug to make these public on their JIRA board but it has not fix date yet.)
But once you have this, then your transformer can simply implement both these traits of type Params and also the DefaultParamsWritable, and your transformer is persistable now.
Really wish this was documented somewhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With