There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and use whole data to train. To be more specific, I need to evaluate my model made by RandomForestClassifier with "oob score" during grid search. Is there easy way to do it? or should I make a class by myself?
The points are
Cross-validation is a method for robustly estimating test-set performance (generalization) of a model. Grid-search is a way to select the best of a family of models, parametrized by a grid of parameters.
GridSearchCV on the other hand, are widely different. Depending on the n_iter chosen, RandomSearchCV can be two, three, four times faster than GridSearchCV. However, the higher the n_iter chosen, the lower will be the speed of RandomSearchCV and the closer the algorithm will be to GridSearchCV.
I would really advise against using OOB to evaluate a model, but it is useful to know how to run a grid search outside of GridSearchCV()
(I frequently do this so I can save the CV predictions from the best grid for easy model stacking). I think the easiest way is to create your grid of parameters via ParameterGrid()
and then just loop through every set of params. For example assuming you have a grid dict, named "grid", and RF model object, named "rf", then you can do something like this:
for g in ParameterGrid(grid): rf.set_params(**g) rf.fit(X,y) # save if best if rf.oob_score_ > best_score: best_score = rf.oob_score_ best_grid = g print "OOB: %0.5f" % best_score print "Grid:", best_grid
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With