I use ML pipeline with various custom UDF-based transformers. What I'm looking for is a way to serialize/deserialize this pipeline.
I serialize the PipelineModel using
ObjectOutputStream.write()
However whenever I try to deserialize the pipeline I'm having:
java.lang.ClassNotFoundException: org.sparkexample.DateTransformer
Where is DateTransformer is my custom transformer. Is there any method/interface to implement for proper serialization?
I've found out there is
MLWritable
Interface that might be implemented by my class (DateTransformer extends Transfrormer), however can't find useful example of it.
If you are using Spark 2.x+ then extend your transformer with DefaultParamsWritable
for example
class ProbabilityMaxer extends Transformer with DefaultParamsWritable{
Then create a constructor with a string parameter
def this(_uid: String) {
this()
}
Finally for a successful read add a companion class
object ProbabilityMaxer extends DefaultParamsReadable[ProbabilityMaxer]
I have this working on my production server. I will add gitlab link to the project later when I upload it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With