I am trying to save thousands of models produced by ML Pipeline. As indicated in the answer here, the models can be saved as follows:
import java.io._
def saveModel(name: String, model: PipelineModel) = {
val oos = new ObjectOutputStream(new FileOutputStream(s"/some/path/$name"))
oos.writeObject(model)
oos.close
}
schools.zip(bySchoolArrayModels).foreach{
case (name, model) => saveModel(name, Model)
}
I have tried using s3://some/path/$name
and /user/hadoop/some/path/$name
as I would like the models to be saved to amazon s3 eventually but they both fail with messages indicating the path cannot be found.
How to save models to Amazon S3?
You can save your model by using the save method of mllib models. After storing it you can load it in another application. As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).
Choosing Between Spark MLlib and Spark ML At first glance, the most obvious difference between MLlib and ML is the data types they work on, with MLlib supporting RDDs and ML supporting DataFrame s and Dataset s.
ML Pipelines provide a uniform set of high-level APIs built on top of DataFrames that help users create and tune practical machine learning pipelines.
A ML pipeline (or a ML workflow) is a sequence of Transformers and Estimators to fit a PipelineModel to an input dataset. pipeline: DataFrame =[fit]=> DataFrame (using transformers and estimators)
One way to save a model to HDFS is as following:
// persist model to HDFS
sc.parallelize(Seq(model), 1).saveAsObjectFile("hdfs:///user/root/linReg.model")
Saved model can then be loaded as:
val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()
For more details see (ref)
Since Apache-Spark 1.6
and in the Scala
API, you can save your models without using any tricks. Because, all models from the ML library come with a save
method, you can check this in the LogisticRegressionModel, indeed it has that method. By the way to load the model you can use a static method.
val logRegModel = LogisticRegressionModel.load("myModel.model")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With