I used to use GridSearchCV(...scoring="accuracy"...) for classification model. and now I am about to use GridSearchCV for the regression model and set scoring with own error function.
Example code:
def rmse(predict, actual):
predict = np.array(predict)
actual = np.array(actual)
distance = predict - actual
square_distance = distance ** 2
mean_square_distance = square_distance.mean()
score = np.sqrt(mean_square_distance)
return score
rmse_score = make_scorer(rmse)
gsSVR = GridSearchCV(...scoring=rmse_score...)
gsSVR.fit(X_train,Y_train)
SVR_best = gsSVR.best_estimator_
print(gsSVR.best_score_)
However, I found it this way return parameter set when the error score is the highest. as a result, I got the worst parameter set and score. In this case, how can I get the best estimator and score?
summary:
classification -> GridSearchCV(scoring="accuracy") -> best_estimaror...best
regression -> GridSearchCV(scroing=rmse_score) -> best_estimator...worst
That is a technically a loss where lower is better. You can turn that option on in make_scorer
:
greater_is_better : boolean, default=True Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.
You also need to change the order of inputs from rmse(predict, actual)
to rmse(actual, predict)
because thats the order GridSearchCV will pass them. So the final scorer will look like this:
def rmse(actual, predict):
...
...
return score
rmse_score = make_scorer(rmse, greater_is_better = False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With