Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomized stratified k-fold cross-validation in scikit-learn?

Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this.

I saw that cross_validation.KFold() has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold() does not have such an option, and cross_validation.StratifiedShuffleSplit() does not produce disjoint folds.

Am I missing something? Is this planned?

(obviously I can implement this by myself)

like image 967
Bitwise Avatar asked May 08 '13 19:05

Bitwise


People also ask

What is stratified k-fold cross-validation?

The stratified k fold cross-validation is an extension of the cross-validation technique used for classification problems. It maintains the same class ratio throughout the K folds as the ratio in the original dataset.

What is Kfold in Sklearn?

K-Folds cross-validator. Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the User Guide.

What is from Sklearn Model_selection?

The train_test_split function of the sklearn. model_selection package in Python splits arrays or matrices into random subsets for train and test data, respectively.

What is stratified sampling in Sklearn?

There are two modules provided by Scikit-learn for Stratified Splitting: StratifiedKFold : This module sets up n_folds of the dataset in a way that the samples are equally balanced in both training and test datasets. Stratification can also be achieved when splitting data by adding a relevant flag called “stratify”.


1 Answers

The shuffling flag for cross_validation.StratifiedKFold has been introduced in the current version 0.15:

http://scikit-learn.org/0.15/modules/generated/sklearn.cross_validation.StratifiedKFold.html

This can be found in the Changelog:

http://scikit-learn.org/stable/whats_new.html#new-features

Shuffle option for cross_validation.StratifiedKFold. By Jeffrey Blackburne.

like image 59
Mutabor Avatar answered Oct 04 '22 00:10

Mutabor