Combination of GridSearchCV's refit and scorer unclear

Question

I use GridSearchCV to find the best parameters in the inner loop of my nested cross-validation. The 'inner winner' is found using GridSearchCV(scorer='balanced_accuracy'), so as I understand the documentation the model with the highest balanced accuracy on average in the inner folds is the 'best_estimator'. I don't understand what the different arguments for refit in GridSearchCV do in combination with the scorer argument. If refit is True, what scoring function will be used to estimate the performance of that 'inner winner' when refitted to the dataset? The same scoring function that was passed to scorer (so in my case 'balanced_accuracy')? Why can you pass also a string to refit? Does that mean that you can use different functions for 1.) finding the 'inner winner' and 2.) to estimate the performance of that 'inner winner' on the whole dataset?

Shihab Shahriar Khan · Accepted Answer

When refit=True, sklearn uses entire training set to refit the model. So, there is no test data left to estimate the performance using any scorer function.

If you use multiple scorer in GridSearchCV, maybe f1_score or precision along with your balanced_accuracy, sklearn needs to know which one of those scorer to use to find the "inner winner" as you say. For example with KNN, f1_score might have best result with K=5, but accuracy might be highest for K=10. There is no way for sklearn to know which value of hyper-parameter K is the best.

To resolve that, you can pass one string scorer to refit to specify which of those scorer should ultimately decide best hyper-parameter. This best value will then be used to retrain or refit the model using full dataset. So, when you've got just one scorer, as your case seems to be, you don't have to worry about this. Simply refit=True will suffice.

Combination of GridSearchCV's refit and scorer unclear

Tags:

python

scikit-learn

grid-search

Johannes Wiesner

1 Answers

Shihab Shahriar Khan

Recent Activity

Donate For Us

Combination of GridSearchCV's refit and scorer unclear

Tags:

python

scikit-learn

grid-search

Johannes Wiesner

1 Answers

Shihab Shahriar Khan

Related questions

Recent Activity

Donate For Us