Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this.
I saw that cross_validation.KFold()
has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold()
does not have such an option, and cross_validation.StratifiedShuffleSplit()
does not produce disjoint folds.
Am I missing something? Is this planned?
(obviously I can implement this by myself)
The stratified k fold cross-validation is an extension of the cross-validation technique used for classification problems. It maintains the same class ratio throughout the K folds as the ratio in the original dataset.
K-Folds cross-validator. Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the User Guide.
The train_test_split function of the sklearn. model_selection package in Python splits arrays or matrices into random subsets for train and test data, respectively.
There are two modules provided by Scikit-learn for Stratified Splitting: StratifiedKFold : This module sets up n_folds of the dataset in a way that the samples are equally balanced in both training and test datasets. Stratification can also be achieved when splitting data by adding a relevant flag called “stratify”.
The shuffling flag for cross_validation.StratifiedKFold
has been introduced in the current version 0.15:
http://scikit-learn.org/0.15/modules/generated/sklearn.cross_validation.StratifiedKFold.html
This can be found in the Changelog:
http://scikit-learn.org/stable/whats_new.html#new-features
Shuffle option for cross_validation.StratifiedKFold. By Jeffrey Blackburne.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With