I would like to use GridSearchCV (with n_jobs > 1) for a particular classifier, but I have information about the folds for 10-fold cross-validation from another source. Is there some way to input data already divided into folds instead of using the folds created by GridSearchCV.
Thanks!
You can create a custom CV iterator, for instance by taking inspiration on LeaveOneGroupOut or LeaveOneGroupOut to implement the structure you are interested in.
Alternatively you can prepare your own precomputed folds encoded as an array of integers (representing sample indices between 0
and n_samples - 1
) and then pass that CV iterator as the cv
argument of the cross_val_score
and GridSearchCV
utilities:
>>> X, y = make_classification(n_samples=10)
>>> import numpy as np
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import cross_val_score
>>> cv_splits = [
... (np.array([0, 1, 2, 3]), np.array([4, 5, 6])),
... (np.array([1, 2, 3, 4]), np.array([5, 6, 7])),
... (np.array([5, 6, 8, 9]), np.array([1, 2, 3, 4])),
... ]
>>> cross_val_score(LogisticRegression(), X, y, cv=cv_splits)
array([1. , 0.33333333, 0.75 ])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With