I would like to know the difference between the score returned by GridSearchCV
and the R2
metric calculated as below. In other cases I receive the grid search score highly negative (same applies for cross_val_score
) and I would be grateful for explaining what it is.
from sklearn import datasets
from sklearn.model_selection import (cross_val_score, GridSearchCV)
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score, r2_score
from sklearn import tree
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
X = pd.DataFrame(X)
parameters = {'splitter':('best','random'),
'max_depth':np.arange(1,10),
'min_samples_split':np.arange(2,10),
'min_samples_leaf':np.arange(1,5)}
regressor = GridSearchCV(DecisionTreeRegressor(), parameters, scoring = 'r2', cv = 5)
regressor.fit(X, y)
print('Best score: ', regressor.best_score_)
best = regressor.best_estimator_
print('R2: ', r2_score(y_pred = best.predict(X), y_true = y))
The regressor.best_score_
is the average of r2 scores on left-out test folds for the best parameter combination.
In your example, the cv=5, so the data will be split into train and test folds 5 times. The model will be fitted on train and scored on test. These 5 test scores are averaged to get the score. Please see documentation:
"best_score_: Mean cross-validated score of the best_estimator"
The above process repeats for all parameter combinations. And the best average score from it is assigned to the best_score_
.
You can look at my other answer for complete working of GridSearchCV
After finding the best parameters, the model is trained on full data.
r2_score(y_pred = best.predict(X), y_true = y)
is on the same data as the model is trained on, so in most cases, it will be higher.
The question linked by @Davide in the comments has answers why you get a positive R2
score - your model performs better than a constant prediction. At the same time you can get negative values in other situation, if your models there perform bad.
the reason for the difference in values is that regressor.best_score_
is evaluated on a particular fold out of the 5-fold split that you do, whereas r2_score(y_pred = best.predict(X), y_true = y)
evaluates the same model (regressor.best_estimator_
) but on the full sample (including the (5-1)-fold sub-set that was used to train that estimator)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With