Spark: Extracting summary for a ML logistic regression model from a pipeline model

Tags:

I've estimated a logistic regression using pipelines.

My last few lines before fitting the logistic regression:

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol="lr_features", labelCol = "targetvar")
# create assember to include encoded features
    lr_assembler = VectorAssembler(inputCols= numericColumns + 
                               [categoricalCol + "ClassVec" for categoricalCol in categoricalColumns],
                               outputCol = "lr_features")
from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
# Model definition:
lr = LogisticRegression(featuresCol = "lr_features", labelCol = "targetvar")
# Pipeline definition:
lr_pipeline = Pipeline(stages = indexStages + encodeStages +[lr_assembler, lr])
# Fit the logistic regression model:
lrModel = lr_pipeline.fit(train_train)

And then I tried to run the summary of the model. However, the code line below:

trainingSummary = lrModel.summary

results in: 'PipelineModel' object has no attribute 'summary'

Any advice on how one could extract the summary information that is usually contained in regression's model from a pipeline model?

Thanks a lot!

898

asked Dec 06 '17 23:12

user3245256

Video Answer

1 Answers

Just get the model from stages:

lrModel.stages[-1].summary

If model is earlier in the Pipeline replace -1 with its index.

135

answered Oct 26 '22 18:10

Alper t. Turker

Related questions
                            
                                How to check is numpy 2d array "surrounded" by zeros
                            
                                How to crop an image from the center with certain dimensions?
                            
                                How to retrieve all the css properties of an element using selenium python?
                            
                                Understanding the matrix output of Tfidfvectorizer in Sklearn
                            
                                Celery 4 not auto-discovering tasks
                            
                                Pass a list of strings as parameter of a dependant task in Airflow
                            
                                ParameterVailidation Failed When Sending List to DynamoDB
                            
                                How to use tf.data.Dataset.padded_batch with a nested shape?
                            
                                Pandas Loc select by index as well as boolean condition in single expression
                            
                                How can I use a custom function within an expression using the eval dataframe method?
                            
                                How to calculate (statistical) power function vs. sample size in python?
                            
                                Can I pickle Python objects in memory instead of a physical file? [duplicate]
                            
                                Django ModuleNotFoundError
                            
                                Python socket listen on all ports
                            
                                Automatically updating known_hosts file when host key changes using Paramiko
                            
                                Got an error creating the test database: Django unittest
                            
                                Instrumenting Python Code
                            
                                Bokeh scatterplot with gradient colors
                            
                                Python custom decorator not working with Celery tasks [duplicate]
                            
                                How to define a setup method only called once during testing with nosetest?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark: Extracting summary for a ML logistic regression model from a pipeline model

Tags:

python

apache-spark

logistic-regression

pyspark

pipeline

user3245256

People also ask

Video Answer

1 Answers

Alper t. Turker

Recent Activity

Donate For Us