Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using explicit (predefined) validation set for grid search with sklearn

I have a dataset, which has previously been split into 3 sets: train, validation and test. These sets have to be used as given in order to compare the performance across different algorithms.

I would now like to optimize the parameters of my SVM using the validation set. However, I cannot find how to input the validation set explicitly into sklearn.grid_search.GridSearchCV(). Below is some code I've previously used for doing K-fold cross-validation on the training set. However, for this problem I need to use the validation set as given. How can I do that?

from sklearn import svm, cross_validation from sklearn.grid_search import GridSearchCV  # (some code left out to simplify things)  skf = cross_validation.StratifiedKFold(y_train, n_folds=5, shuffle = True) clf = GridSearchCV(svm.SVC(tol=0.005, cache_size=6000,                              class_weight=penalty_weights),                      param_grid=tuned_parameters,                      n_jobs=2,                      pre_dispatch="n_jobs",                      cv=skf,                      scoring=scorer) clf.fit(X_train, y_train) 
like image 528
pir Avatar asked Aug 11 '15 18:08

pir


1 Answers

Use PredefinedSplit

ps = PredefinedSplit(test_fold=your_test_fold) 

then set cv=ps in GridSearchCV

test_fold : “array-like, shape (n_samples,)

test_fold[i] gives the test set fold of sample i. A value of -1 indicates that the corresponding sample is not part of any test set folds, but will instead always be put into the training fold.

Also see here

when using a validation set, set the test_fold to 0 for all samples that are part of the validation set, and to -1 for all other samples.

like image 150
yangjie Avatar answered Sep 21 '22 17:09

yangjie