Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save/export a Spark ML Lib model to PMML?

I'd like to train a model using Spark ML Lib but then be able to export the model in a platform-agnostic format. Essentially I want to decouple how models are created and consumed.

My reason for wanting this decoupling is so that I can deploy a model in other projects. E.g.:

  • Use the model to perform predictions in a separate standalone program which doesn't depend on Spark for the evaluation.
  • Use the model with existing projects such as OpenScoring and provide APIs which can make use of the model.
  • Load an existing model back into Spark for high throughput prediction.

Has anyone done something like this with Spark ML Lib?

like image 625
trianta2 Avatar asked Apr 15 '15 14:04

trianta2


People also ask

How do you save a model on MLlib?

You can save your model by using the save method of mllib models. After storing it you can load it in another application. As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).

Is spark MLlib deprecated?

MLlib automated MLflow tracking is deprecated on clusters that run Databricks Runtime 10.1 ML and above, and it is disabled by default on clusters running Databricks Runtime 10.2 ML and above. Instead, use MLflow PySpark ML autologging by calling mlflow.


1 Answers

Version of Spark 1.4 now has support for this. See latest documentation. Not all models are available (see to be supported (see the JIRA issue SPARK-4587).

HTHs

like image 199
user2051561 Avatar answered Oct 02 '22 17:10

user2051561