Consider the following gridsearch :grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5)
grid_fit = grid.fit(X_train1, y_train1)
According to Sklearn's ressource, grid_fit.best_score_
returns The mean cross-validated score of the best_estimator .
To me that would mean that the average of :
cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5)
should be exactly the same as:
grid_fit.best_score_
.
However I am getting a 10% difference between the two numbers. What am I missing ?
I am using the gridsearch on proprietary data so I am hoping somebody has run into something similar in the past and can guide me without a fully reproducible example. I will try to reproduce this with the Iris dataset if it's not clear enough...
So, Grid search is basically a brute forcing strategy in which you run the model with all possible hyperparameter combinations. With coss_val_score you don't perform the grid search (you don't use the strategy mentioned above with all predefined params), but you get the score after the cross-validation.
The “best” parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid. from sklearn.model_selection import GridSearchCV.
Does GridSearchCV use cross-validation? GridSearchCV does, in fact, do cross-validation. If I understand the notion correctly, you want to hide a portion of your data set from the model so that it may be tested. As a result, you train your models on training data and then test them on testing data.
when an integer number is passed to GridSearchCV(..., cv=int_number)
parameter, then the StratifiedKFold
will be used for cross-validation splitting. So the data set will be randomly splitted by StratifiedKFold
. This might affect the accuracy and therefore the best score.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With