Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GridSearchCV.best_score not same as cross_val_score(GridSearchCV.best_estimator_)

Consider the following gridsearch :
grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5)
grid_fit = grid.fit(X_train1, y_train1)

According to Sklearn's ressource, grid_fit.best_score_ returns The mean cross-validated score of the best_estimator .

To me that would mean that the average of :

cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5)

should be exactly the same as:

grid_fit.best_score_.

However I am getting a 10% difference between the two numbers. What am I missing ?

I am using the gridsearch on proprietary data so I am hoping somebody has run into something similar in the past and can guide me without a fully reproducible example. I will try to reproduce this with the Iris dataset if it's not clear enough...

like image 591
Eric F Avatar asked Jun 15 '18 17:06

Eric F


People also ask

What is the difference between GridSearchCV and cross-validation?

So, Grid search is basically a brute forcing strategy in which you run the model with all possible hyperparameter combinations. With coss_val_score you don't perform the grid search (you don't use the strategy mentioned above with all predefined params), but you get the score after the cross-validation.

What does best score mean in GridSearchCV?

The “best” parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid. from sklearn.model_selection import GridSearchCV.

Does GridSearchCV do cross-validation?

Does GridSearchCV use cross-validation? GridSearchCV does, in fact, do cross-validation. If I understand the notion correctly, you want to hide a portion of your data set from the model so that it may be tested. As a result, you train your models on training data and then test them on testing data.


1 Answers

when an integer number is passed to GridSearchCV(..., cv=int_number) parameter, then the StratifiedKFold will be used for cross-validation splitting. So the data set will be randomly splitted by StratifiedKFold. This might affect the accuracy and therefore the best score.

like image 120
MaxU - stop WAR against UA Avatar answered Oct 31 '22 17:10

MaxU - stop WAR against UA