I am running a logistic regression in PySpark using spark version: 2.1.2
I know it is possible to save a regression model as follows:
model = LogisticRegression(featuresCol='features',
labelCol='is_clickout',
regParam=0,
fitIntercept=False,
family="binomial")
model = pipeline.fit(data)
# save model for future use
save_path = "model_0"
model.save(save_path)
The problem is that the saved model does not save the summary:
from pyspark.ml.classification import LogisticRegressionModel
model2 = LogisticRegressionModel.load(save_path)
model2.hasSummary ##### Returns FALSE
I can extract the summary as follows, but it has no save method attached to it:
# Get the model summary
summary = model.stages[-1].summary
Is there a quick way to save the summary object? For multiple regressions?
Currently, I read all the object attributes and save them as a Pandas dataframe df
.
You can save your model by using the save method of mllib models. After storing it you can load it in another application. As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).
A Pipeline is an Estimator . Thus, after a Pipeline 's fit() method runs, it produces a PipelineModel , which is a Transformer . This PipelineModel is used at test time; the figure below illustrates this usage.
A Spark Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator . These stages are run in order, and the input DataFrame is transformed as it passes through each stage.
Unfortunately, your observation is correct. I had the same problem with Spark 2.4.3 and I've found this comment confirming the issue:
For LinearRegressionModel, this does NOT currently save the training summary. An option to save summary may be added in the future.
This same comment is still there for Spark 3.0.0-rc1 (the last available tag in its repository).
If we want to persist the summary, we need to serialize it somehow ourselves. I've done this before by extracting the statistics I wanted and saving them in a JSON document just after training my model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With