Extract results from CrossValidator with paramGrid in pySpark

Question

I train a Random Forest with pySpark. I want to have a csv with the results, per dot in the grid. My code is:

estimator = RandomForestRegressor()
evaluator = RegressionEvaluator()
paramGrid = ParamGridBuilder().addGrid(estimator.numTrees, [2,3])\
                              .addGrid(estimator.maxDepth, [2,3])\
                              .addGrid(estimator.impurity, ['variance'])\
                              .addGrid(estimator.featureSubsetStrategy, ['sqrt'])\
                              .build()
pipeline = Pipeline(stages=[estimator])

crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=evaluator,
                          numFolds=3)

cvModel = crossval.fit(result)

So I want a csv:

numTrees | maxDepth | impurityMeasure 

2            2          0.001 

2            3          0.00023

Etc

What is the best way to do this?

Alper t. Turker · Accepted Answer

You'll have to combine different bits of data:

Estimator ParamMaps extracted using getEstimatorParamMaps method.
Training metrics which can be retrieved using avgMetrics parameter.

First get names and values of all parameters declared in the map:

params = [{p.name: v for p, v in m.items()} for m in cvModel.getEstimatorParamMaps()]

Thane zip with metrics and convert to a data frame

import pandas as pd

pd.DataFrame.from_dict([
    {cvModel.getEvaluator().getMetricName(): metric, **ps} 
    for ps, metric in zip(params, cvModel.avgMetrics)
])

Extract results from CrossValidator with paramGrid in pySpark

Tags:

python

apache-spark

pyspark

apache-spark-ml

okuoub

1 Answers

Alper t. Turker

Recent Activity

Donate For Us

Extract results from CrossValidator with paramGrid in pySpark

Tags:

python

apache-spark

pyspark

apache-spark-ml

okuoub

1 Answers

Alper t. Turker

Related questions

Recent Activity

Donate For Us