Does GridSearchCV use predict or predict_proba, when using auc_score as score function?
The predict function generates predicted class labels, which will always result in a triangular ROC-curve. A more curved ROC-curve is obtained using the predicted class probabilities. The latter one is, as far as I know, more accurate. If so, the area under the 'curved' ROC-curve is probably best to measure classification performance within the grid search.
Therefore I am curious if either the class labels or class probabilities are used for the grid search, when using the area under the ROC-curve as performance measure. I tried to find the answer in the code, but could not figure it out. Does anyone here know the answer?
Thanks
However, GridSearchCV will use the same shuffling for each set of parameters validated by a single call to its fit method.
GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
GridSearchCV is a technique to search through the best parameter values from the given set of the grid of parameters. It is basically a cross-validation method. the model and the parameters are required to be fed in. Best parameter values are extracted and then the predictions are made.
Does GridSearchCV use predict or predict_proba, when using auc_score as score function? The predict function generates predicted class labels, which will always result in a triangular ROC-curve. A more curved ROC-curve is obtained using the predicted class probabilities. The latter one is, as far as I know, more accurate.
We can also set the scoring parameter into the GridSearchCV model as a following. By default, it checks the R-squared metrics score. score = make_scorer (mean_squared_error) Fitting the model and getting the best estimator
The scoring metric can be any metric of your choice. However, just like the estimator object, the scoring metric should be chosen based on what type of problem the project is trying to solve. The other two parameters in the grid search is where the limitations come in to play.
It runs through all the different parameters that is fed into the parameter grid and produces the best combination of parameters, based on a scoring metric of your choice (accuracy, f1, etc). Obviously, nothing is perfect and GridSearchCV is no exception:
To use auc_score
for grid searching you really need to use predict_proba
or decision_function
as you pointed out. This is not possible in the 0.13 release. If you do score_func=auc_score
it will use predict
which doesn't make any sense.
[edit]Since 0.14[/edit] it is possible to do grid-search using auc_score, by setting the new scoring
parameter to roc_auc
: GridSearch(est, param_grid, scoring='roc_auc')
. It will do the right thing and use predict_proba
(or decision_function
if predict_proba
is not available).
See the whats new page of the current dev version.
You need to install the current master from github to get this functionality or wait until April (?) for 0.14.
After performing some experiments with Sklearn SVC (which has predict_proba
available) comparing some results with predict_proba
and decision_function
, it seems that roc_auc
in GridSearchCV
uses decision_function
to compute AUC scores. I found a similar discussion here: Reproducing Sklearn SVC within GridSearchCV's roc_auc scores manually
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With