I want to optimize the hyper parameters of a PySpark Pipeline using a ranking metric (MAP@k). I have seen in the documentation how to use the metrics defined in the Evaluation (Scala), but I need to define a custom evaluator class because MAP@k is not implemented yet. So I need to do something like:
model = Pipeline(stages=[indexer, assembler, scaler, lg])
paramGrid_lg = ParamGridBuilder() \
.addGrid(lg.regParam, [0.001, 0.1]) \
.addGrid(lg.elasticNetParam, [0, 1]) \
.build()
crossval_lg = CrossValidator(estimator=model,
estimatorParamMaps=paramGrid_lg,
evaluator=MAPkEvaluator(),
numFolds=2)
where MAPkEvaluator()
is my custom evaluator. I've seen a similar question but not the answer.
Is there any example or documentation available for this? Does anyone know if it Is possible to implement it in PySpark? What methods should I implement?
@jarandaf answered the question in the first comment, but for clarity reasons I write how to implement a basic example with a random metric:
import random
from pyspark.ml.evaluation import Evaluator
class RandomEvaluator(Evaluator):
def __init__(self, predictionCol="prediction", labelCol="label"):
self.predictionCol = predictionCol
self.labelCol = labelCol
def _evaluate(self, dataset):
"""
Returns a random number.
Implement here the true metric
"""
return random.randint(0,1)
def isLargerBetter(self):
return True
Now the following code should work:
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
paramGrid_lg = ParamGridBuilder() \
.addGrid(lg.regParam, [0.01, 0.1]) \
.addGrid(lg.elasticNetParam, [0, 1]) \
.build()
crossval_lg = CrossValidator(estimator=model,
estimatorParamMaps=paramGrid_lg,
evaluator= RandomEvaluator(),
numFolds=2)
cvModel = crossval_lg.fit(train_val_data_)
@Amanda answered very well the question, but let me show you something to avoid too. If you check the help of Evaluator()
class doing:
help(Evaluator())
you'll see a method defined there:
isLargerBetter(self)
| Indicates whether the metric returned by :py:meth:`evaluate` should be maximized
| (True, default) or minimized (False).
| A given evaluator may support multiple metrics which may be maximized or minimized.
|
| .. versionadded:: 1.5.0
Now if you your metric needs to be minimized, you need to set this method as:
def isLargerBetter(self):
return False
The default value for the current method is True
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With