I am running a linear regression using Spark Pipelines in pyspark. Once the linear regression model is trained, how do I get the coefficients out?
Here is my pipeline code:
# Get all of our features together into one array called "features". Do not include the label!
feature_assembler = VectorAssembler(inputCols=get_column_names(df_train), outputCol="features")
# Define our model
lr = LinearRegression(maxIter=100, elasticNetParam=0.80, labelCol="label", featuresCol="features",
predictionCol = "prediction")
# Define our pipeline
pipeline_baseline = Pipeline(stages=[feature_assembler, lr])
# Train our model using the training data
model_baseline = pipeline_baseline.fit(df_train)
# Use our trained model to make predictions using the validation data
output_baseline = model_baseline.transform(df_val) #.select("features", "label", "prediction", "coefficients")
predictions_baseline = output_baseline.select("label", "prediction")
I have tried using methods from the PipelineModel class. Here are my attempts to get the coefficients, but I only get an empty list and an empty dictionary:
params = model_baseline.stages[1].params
print 'Try 1 - Parameters: %s' %(params)
params = model_baseline.stages[1].extractParamMap()
print 'Try 2 - Parameters: %s' %(params)
Out[]:
Try 1 - Parameters: []
Try 2 - Parameters: {}
Are there methods for PipelineModel that return the trained coefficients?
You are looking at the wrong property. params
can be used to extract Estimator
or Transformer
Params
like input or output columns (see ML Pipeline parameters docs and not estimated values.
For LinearRegressionModel
use coefficients
:
model.stages[-1].coefficients
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With