Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: Extracting summary for a ML logistic regression model from a pipeline model

I've estimated a logistic regression using pipelines.

My last few lines before fitting the logistic regression:

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol="lr_features", labelCol = "targetvar")
# create assember to include encoded features
    lr_assembler = VectorAssembler(inputCols= numericColumns + 
                               [categoricalCol + "ClassVec" for categoricalCol in categoricalColumns],
                               outputCol = "lr_features")
from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
# Model definition:
lr = LogisticRegression(featuresCol = "lr_features", labelCol = "targetvar")
# Pipeline definition:
lr_pipeline = Pipeline(stages = indexStages + encodeStages +[lr_assembler, lr])
# Fit the logistic regression model:
lrModel = lr_pipeline.fit(train_train)

And then I tried to run the summary of the model. However, the code line below:

trainingSummary = lrModel.summary

results in: 'PipelineModel' object has no attribute 'summary'

Any advice on how one could extract the summary information that is usually contained in regression's model from a pipeline model?

Thanks a lot!

like image 898
user3245256 Avatar asked Dec 06 '17 23:12

user3245256


People also ask

How is logistic regression in spark ML trained?

Build Logistic Regression model In order to train and test the model the data set need to be split into a training data set and a test data set. 70% of the data is used to train the model, and 30% will be used for testing. The same model can use built with spark Pipeline .

What is PipelineModel?

A Pipeline is an Estimator . Thus, after a Pipeline 's fit() method runs, it produces a PipelineModel , which is a Transformer . This PipelineModel is used at test time; the figure below illustrates this usage.


Video Answer


1 Answers

Just get the model from stages:

lrModel.stages[-1].summary

If model is earlier in the Pipeline replace -1 with its index.

like image 135
Alper t. Turker Avatar answered Oct 26 '22 18:10

Alper t. Turker