Context:
I have a Spark ML pipeline that contains a VectorAssembler, StringIndexer, and a DecisionTreeClassifier. Using this pipeline I am able to successfully fit the model and transform my data frame. I would like to store this model for future use, but I keep getting the following error:
Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable.
Non-Writable stage: dtc_9c04161ed2d1 of type class org.apache.spark.ml.classification.DecisionTreeClassificationModel
What I have tried:
val pipeline = new Pipeline().setStages(Array(assembler, labelIndexer, dt))
val model = pipeline.fit(dfIndexed)
model.write.overwrite().save("test/model/pipeline")
This works properly when I remove the classifier (i.e. dt). Is there a way of saving a DecisionTreeClassifier model?
My data consists of some indexed categorical values that I must map back to their original form (I know this will require using IndexToString). I am using Spark 1.6.
This cannot be done as of Spark 1.6. The issue is being tracked here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With