Custom folds for cross-validation in scikit-learn

Question

I would like to use GridSearchCV (with n_jobs > 1) for a particular classifier, but I have information about the folds for 10-fold cross-validation from another source. Is there some way to input data already divided into folds instead of using the folds created by GridSearchCV.

Thanks!

ogrisel · Accepted Answer

You can create a custom CV iterator, for instance by taking inspiration on LeaveOneGroupOut or LeaveOneGroupOut to implement the structure you are interested in.

Alternatively you can prepare your own precomputed folds encoded as an array of integers (representing sample indices between 0 and n_samples - 1) and then pass that CV iterator as the cv argument of the cross_val_score and GridSearchCV utilities:

>>> X, y = make_classification(n_samples=10)
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import cross_val_score
>>> cv_splits = [
...     (np.array([0, 1, 2, 3]), np.array([4, 5, 6])),
...     (np.array([1, 2, 3, 4]), np.array([5, 6, 7])),
...     (np.array([5, 6, 8, 9]), np.array([1, 2, 3, 4])),
... ]
>>> cross_val_score(LogisticRegression(), X, y, cv=cv_splits)
array([1.        , 0.33333333, 0.75      ])

Custom folds for cross-validation in scikit-learn

Tags:

python

scikit-learn

user1953384

1 Answers

ogrisel

Recent Activity

Donate For Us

Custom folds for cross-validation in scikit-learn

Tags:

python

scikit-learn

user1953384

1 Answers

ogrisel

Related questions

Recent Activity

Donate For Us