Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn grid search with grouped K fold cv generator

I am trying to implement a grid search over parameters in sklearn using randomized search and a grouped k fold cross-validation generator. The following works:

skf=StratifiedKFold(n_splits=5,shuffle=True,random_state=0)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=skf,n_iter=10)
rs.fit(X,y)

This doesn't

gkf=GroupKFold(n_splits=5)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)

#ValueError: The groups parameter should not be None

How do I indicate the groups parameter?

Neither does this

gkf=GroupKFold(n_splits=5)
fv = gkf.split(X, y, groups=groups)
rs=sklearn.model_selection.RandomizedSearchCV(clf,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
rs.fit(X,y)

#TypeError: object of type 'generator' has no len()
like image 654
user0 Avatar asked Mar 17 '17 14:03

user0


People also ask

How does sklearn’s gridsearchcv work?

In a grid search, you try a grid of hyper-parameters and evaluate the performance of each combination of hyper-parameters. How does Sklearn’s GridSearchCV Work? The GridSearchCV class in Sklearn serves a dual purpose in tuning your model. The class allows you to: This tutorial won’t go into the details of k-fold cross validation.

How do I use gridsearchcv for k-folds?

@Stuart Colianni The answer is the same as above: use GridSearchCV. I am not sure whether you understand what GridSearchCV does, so here is a summary: 2) Divide the WHOLE train data into K-folds. 3) Use the train portion of each fold for training with parameters from (1), validate on the validation portion of each fold.

How do I prepare a kNN classifier for gridsearchcv?

We first create a KNN classifier instance and then prepare a range of values of hyperparameter K from 1 to 31 that will be used by GridSearchCV to find the best value of K. Furthermore, we set our cross-validation batch sizes cv = 10 and set scoring metrics as accuracy as our preference.

How many neighbours should I use with sklearn gridsearchcv?

This attribute provides the hyper-parameters that for the given data and options for the hyper-parameters. This indicates that it’s best to use 11 neighbours, the Manhattan distance, and a distance-weighted neighbour search. Do You Need to Split Data with Sklearn GridSearchCV?


1 Answers

For reference, this is done via

rs.fit(X,y,groups=groups)

for

rs=sklearn.model_selection.RandomizedSearchCV(forest,parameters,scoring='roc_auc',cv=gkf,n_iter=10)
like image 135
user0 Avatar answered Oct 19 '22 04:10

user0