Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem.

num_classes = len(np.unique(y_train)) y_train_categorical = keras.utils.to_categorical(y_train, num_classes) kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)  # splitting data into different folds for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):     x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]     y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]  ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead. 
like image 866
jKraut Avatar asked Jan 29 '18 18:01

jKraut


1 Answers

keras.utils.to_categorical produces a one-hot encoded class vector, i.e. the multilabel-indicator mentioned in the error message. StratifiedKFold is not designed to work with such input; from the split method docs:

split(X, y, groups=None)

[...]

y : array-like, shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

i.e. your y must be a 1-D array of your class labels.

Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train), and convert to_categorical afterwards.

like image 160
desertnaut Avatar answered Sep 17 '22 06:09

desertnaut