Is
class sklearn.cross_validation.ShuffleSplit(
n,
n_iterations=10,
test_fraction=0.10000000000000001,
indices=True,
random_state=None
)
the right way for 10*10fold CV in scikit-learn? (By changing the random_state to 10 different numbers)
Because I didn't find any random_state
parameter in Stratified K-Fold
or K-Fold
and the separate from K-Fold
are always identical for the same data.
If ShuffleSplit
is the right, one concern is that it is mentioned
Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this is still very likely for sizeable datasets
Is this always the case for 10*10 fold CV?
I am not sure what you mean by 10*10 cross validation. The ShuffleSplit configuration you give will make you call the fit method of the estimator 10 times. If you call this 10 times by explicitly using an outer loop or directly call it 100 times with 10% of the data reserved for testing in a single loop if you use instead:
>>> ss = ShuffleSplit(X.shape[0], n_iterations=100, test_fraction=0.1,
... random_state=42)
If you want to do 10 runs of StratifiedKFold with k=10 you can shuffle the dataset between the runs (that would lead to a total 100 calls to the fit method with a 90% train / 10% test split for each call to fit):
>>> from sklearn.utils import shuffle
>>> from sklearn.cross_validation import StratifiedKFold, cross_val_score
>>> for i in range(10):
... X, y = shuffle(X_orig, y_orig, random_state=i)
... skf = StratifiedKFold(y, 10)
... print cross_val_score(clf, X, y, cv=skf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With