I would like to run a regular 'gridsearch without the CV', i.e. I don't want to cross-validate, but setting cv=1
is not allowed.
I am doing this because I am using a classifier to draw decision boundaries and visualize/understand my data instead of predicting labels, and do not care about the generalization error. I would like to minimize the training error instead.
EDIT: I guess I'm really asking two questions
cv=1
in GridSearchCV
? Answered by ogrisel belowscoring
parameter in GridSearchCV
?What you would need to do is: Use the arg cv from the docs and give it a generator which yields a tuple with all indices (so that train and test are same) Change the scoring arg to use the oob given out from the Random forest.
In GridSearchCV, along with Grid Search, cross-validation is also performed. Cross-Validation is used while training the model. As we know that before training the model with data, we divide the data into two parts – train data and test data.
The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
Observing the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster.
You can pass an instance of ShuffleSplit(test_size=0.20, n_splits=1, random_state=0)
as the cv
parameter.
That will do a single CV split per parameter combination (sklearn.model_selection.ShuffleSplit).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With