Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Access Spark PipelineModel Parameters

I am running a linear regression using Spark Pipelines in pyspark. Once the linear regression model is trained, how do I get the coefficients out?

Here is my pipeline code:

# Get all of our features together into one array called "features".  Do not include the label!
feature_assembler = VectorAssembler(inputCols=get_column_names(df_train), outputCol="features")

# Define our model
lr = LinearRegression(maxIter=100, elasticNetParam=0.80, labelCol="label", featuresCol="features", 
                  predictionCol = "prediction")

# Define our pipeline
pipeline_baseline = Pipeline(stages=[feature_assembler, lr])

# Train our model using the training data
model_baseline = pipeline_baseline.fit(df_train)

# Use our trained model to make predictions using the validation data
output_baseline = model_baseline.transform(df_val)  #.select("features", "label", "prediction", "coefficients")
predictions_baseline = output_baseline.select("label", "prediction")

I have tried using methods from the PipelineModel class. Here are my attempts to get the coefficients, but I only get an empty list and an empty dictionary:

params = model_baseline.stages[1].params
print 'Try 1 - Parameters: %s' %(params)
params = model_baseline.stages[1].extractParamMap()
print 'Try 2 - Parameters: %s' %(params)

Out[]:
Try 1 - Parameters: []
Try 2 - Parameters: {}

Are there methods for PipelineModel that return the trained coefficients?

like image 324
M. Oneto Avatar asked Aug 03 '16 18:08

M. Oneto


1 Answers

You are looking at the wrong property. params can be used to extract Estimator or Transformer Params like input or output columns (see ML Pipeline parameters docs and not estimated values.

For LinearRegressionModel use coefficients:

model.stages[-1].coefficients
like image 199
zero323 Avatar answered Oct 09 '22 21:10

zero323