Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run GridSearchCV without cross-validation?

Tags:

scikit-learn

I would like to run a regular 'gridsearch without the CV', i.e. I don't want to cross-validate, but setting cv=1 is not allowed.

I am doing this because I am using a classifier to draw decision boundaries and visualize/understand my data instead of predicting labels, and do not care about the generalization error. I would like to minimize the training error instead.

EDIT: I guess I'm really asking two questions

  1. How to hack cv=1 in GridSearchCV? Answered by ogrisel below
  2. Does it make sense to do a gridsearch to minimize training error instead of generalization error, and if so, how would I do that? I suspect it involves inserting my own scoring function for the scoring parameter in GridSearchCV?
like image 355
selwyth Avatar asked Apr 08 '15 00:04

selwyth


People also ask

How do you do grid search without cross-validation?

What you would need to do is: Use the arg cv from the docs and give it a generator which yields a tuple with all indices (so that train and test are same) Change the scoring arg to use the oob given out from the Random forest.

Does GridSearchCV include cross-validation?

In GridSearchCV, along with Grid Search, cross-validation is also performed. Cross-Validation is used while training the model. As we know that before training the model with data, we divide the data into two parts – train data and test data.

Which is better GridSearchCV or RandomizedSearchCV?

The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.

How long does it take to run GridSearchCV?

Observing the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster.


1 Answers

You can pass an instance of ShuffleSplit(test_size=0.20, n_splits=1, random_state=0) as the cv parameter.

That will do a single CV split per parameter combination (sklearn.model_selection.ShuffleSplit).

like image 84
ogrisel Avatar answered Oct 25 '22 02:10

ogrisel