Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any means to serialize custom Transformer in Spark ML Pipeline

I use ML pipeline with various custom UDF-based transformers. What I'm looking for is a way to serialize/deserialize this pipeline.

I serialize the PipelineModel using

ObjectOutputStream.write() 

However whenever I try to deserialize the pipeline I'm having:

java.lang.ClassNotFoundException: org.sparkexample.DateTransformer

Where is DateTransformer is my custom transformer. Is there any method/interface to implement for proper serialization?

I've found out there is

MLWritable

Interface that might be implemented by my class (DateTransformer extends Transfrormer), however can't find useful example of it.

like image 421
Igor Kustov Avatar asked Oct 27 '16 12:10

Igor Kustov


1 Answers

If you are using Spark 2.x+ then extend your transformer with DefaultParamsWritable

for example

class ProbabilityMaxer extends Transformer with DefaultParamsWritable{

Then create a constructor with a string parameter

 def this(_uid: String) {
    this()
  }

Finally for a successful read add a companion class

object ProbabilityMaxer extends  DefaultParamsReadable[ProbabilityMaxer]

I have this working on my production server. I will add gitlab link to the project later when I upload it

like image 197
Ganesh Krishnan Avatar answered Sep 29 '22 08:09

Ganesh Krishnan