Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create custom writable transformer?

I am writing a custom spark.ml transformer by extending Transformer.

Everything is fine, however I am not able to save this instance of this transformer since it does not extend from DefaultParamsWritable trait as does all transformers, and I cannot directly mixin DefaultParamsWritable trait either as it is package specific for org.apache.spark.ml.

One workaround to this is to put your class under org.apache.spark.ml. Is this the only way to achieve this? Any better solutions?

like image 827
Pravin Gadakh Avatar asked Nov 09 '22 17:11

Pravin Gadakh


1 Answers

Finally found a way of doing this!

So the trick has two steps.

If you plan on coding a transformer that has some variables that need to be written when saved, then it needs to be a a trait that extends org.apache.spark.ml.param.Params class.

The common traits like HasInputCol are private to the spark ml package so you need to reimplement those as well in a public util package of your own choice. (There is a bug to make these public on their JIRA board but it has not fix date yet.)

But once you have this, then your transformer can simply implement both these traits of type Params and also the DefaultParamsWritable, and your transformer is persistable now.

Really wish this was documented somewhere.

like image 193
Subramaniam Ramasubramanian Avatar answered Nov 14 '22 21:11

Subramaniam Ramasubramanian