Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom folds for cross-validation in scikit-learn

I would like to use GridSearchCV (with n_jobs > 1) for a particular classifier, but I have information about the folds for 10-fold cross-validation from another source. Is there some way to input data already divided into folds instead of using the folds created by GridSearchCV.

Thanks!

like image 954
user1953384 Avatar asked Aug 15 '13 16:08

user1953384


1 Answers

You can create a custom CV iterator, for instance by taking inspiration on LeaveOneGroupOut or LeaveOneGroupOut to implement the structure you are interested in.

Alternatively you can prepare your own precomputed folds encoded as an array of integers (representing sample indices between 0 and n_samples - 1) and then pass that CV iterator as the cv argument of the cross_val_score and GridSearchCV utilities:

>>> X, y = make_classification(n_samples=10)
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import cross_val_score
>>> cv_splits = [
...     (np.array([0, 1, 2, 3]), np.array([4, 5, 6])),
...     (np.array([1, 2, 3, 4]), np.array([5, 6, 7])),
...     (np.array([5, 6, 8, 9]), np.array([1, 2, 3, 4])),
... ]
>>> cross_val_score(LogisticRegression(), X, y, cv=cv_splits)
array([1.        , 0.33333333, 0.75      ])
like image 180
ogrisel Avatar answered Sep 29 '22 10:09

ogrisel