I am trying to implement a grid search over parameters in sklearn using randomized search and a grouped k fold cross-validation generator. The following works:
skf=StratifiedKFold(n_splits=5,shuffle=True,random_state=0)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=skf,n_iter=10)
rs.fit(X,y)
This doesn't
gkf=GroupKFold(n_splits=5)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)
#ValueError: The groups parameter should not be None
How do I indicate the groups
parameter?
Neither does this
gkf=GroupKFold(n_splits=5)
fv = gkf.split(X, y, groups=groups)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)
#TypeError: object of type 'generator' has no len()
In a grid search, you try a grid of hyper-parameters and evaluate the performance of each combination of hyper-parameters. How does Sklearn’s GridSearchCV Work? The GridSearchCV class in Sklearn serves a dual purpose in tuning your model. The class allows you to: This tutorial won’t go into the details of k-fold cross validation.
@Stuart Colianni The answer is the same as above: use GridSearchCV. I am not sure whether you understand what GridSearchCV does, so here is a summary: 2) Divide the WHOLE train data into K-folds. 3) Use the train portion of each fold for training with parameters from (1), validate on the validation portion of each fold.
We first create a KNN classifier instance and then prepare a range of values of hyperparameter K from 1 to 31 that will be used by GridSearchCV to find the best value of K. Furthermore, we set our cross-validation batch sizes cv = 10 and set scoring metrics as accuracy as our preference.
This attribute provides the hyper-parameters that for the given data and options for the hyper-parameters. This indicates that it’s best to use 11 neighbours, the Manhattan distance, and a distance-weighted neighbour search. Do You Need to Split Data with Sklearn GridSearchCV?
For reference, this is done via
rs.fit(X,y,groups=groups)
for
rs=sklearn.model_selection.RandomizedSearchCV(forest,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With