The following code is used to do KFold Validation but I am to train the model as it is throwing the error <pre class="prettyprint"><code>ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,) </code></pre> My target Variable has 7 classes. I am using <code>LabelEncoder</code> to encode the classes into numbers. By seeing this error, If I am changing the into <code>MultiLabelBinarizer</code> to encode the classes. I am getting the following error <pre class="prettyprint"><code>ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead. </code></pre> The following is the code for KFold validation <pre class="prettyprint"><code>skf = StratifiedKFold(n_splits=10, shuffle=True) scores = np.zeros(10) idx = 0 for index, (train_indices, val_indices) in enumerate(skf.split(X, y)): print("Training on fold " + str(index+1) + "/10...") # Generate batches from indices xtrain, xval = X[train_indices], X[val_indices] ytrain, yval = y[train_indices], y[val_indices] model = None model = load_model() //defined above scores[idx] = train_model(model, xtrain, ytrain, xval, yval) idx+=1 print(scores) print(scores.mean()) </code></pre> I don't know what to do. I want to use Stratified K Fold on my model. Please help me.

<code>MultiLabelBinarizer</code> returns a vector which is of the length of your number of classes. If you look at how <code>StratifiedKFold</code> splits your dataset, you will see that it only accepts a one-dimensional target variable, whereas you are trying to pass a target variable with dimensions <code>[n_samples, n_classes]</code> Stratefied split basically preserves your class distribution. And if you think about it, it does not make a lot of sense if you have a multi-label classification problem. If you want to preserve the distribution in terms of the different combinations of classes in your target variable, then the answer here explains two ways in which you can define your own stratefied split function. <h3>UPDATE:</h3> The logic is something like this: Assuming you have <code>n</code> classes and your target variable is a combination of these <code>n</code> classes. You will have <code>(2^n) - 1</code> combinations (Not including all 0s). You can now create a new target variable considering each combination as a new label. For example, if <code>n=3</code>, you will have <code>7</code> unique combinations: <pre class="prettyprint"><code> 1. [1, 0, 0] 2. [0, 1, 0] 3. [0, 0, 1] 4. [1, 1, 0] 5. [1, 0, 1] 6. [0, 1, 1] 7. [1, 1, 1] </code></pre> Map all your labels to this new target variable. You can now look at your problem as simple multi-class classification, instead of multi-label classification. Now you can directly use <code>StartefiedKFold</code> using <code>y_new</code> as your target. Once the splits are done, you can map your labels back. Code sample: <pre class="prettyprint"><code>import numpy as np np.random.seed(1) y = np.random.randint(0, 2, (10, 7)) y = y[np.where(y.sum(axis=1) != 0)[0]] </code></pre> OUTPUT: <pre class="prettyprint"><code>array([[1, 1, 0, 0, 1, 1, 1], [1, 1, 0, 0, 1, 0, 1], [1, 0, 0, 1, 0, 0, 0], [1, 0, 0, 1, 0, 0, 0], [1, 0, 0, 0, 1, 1, 1], [1, 1, 0, 0, 0, 1, 1], [1, 1, 1, 1, 0, 1, 1], [0, 0, 1, 0, 0, 1, 1], [1, 0, 1, 0, 0, 1, 1], [0, 1, 1, 1, 1, 0, 0]]) </code></pre> Label encode your class vectors: <pre class="prettyprint"><code>from sklearn.preprocessing import LabelEncoder def get_new_labels(y): y_new = LabelEncoder().fit_transform([''.join(str(l)) for l in y]) return y_new y_new = get_new_labels(y) </code></pre> OUTPUT: <pre class="prettyprint"><code>array([7, 6, 3, 3, 2, 5, 8, 0, 4, 1]) </code></pre>

Not able to use Stratified-K-Fold on multi label classifier

Tags:

deep-learning

keras

scikit-learn

cross-validation

The following code is used to do KFold Validation but I am to train the model as it is throwing the error

ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,)

My target Variable has 7 classes. I am using LabelEncoder to encode the classes into numbers.

By seeing this error, If I am changing the into MultiLabelBinarizer to encode the classes. I am getting the following error

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

The following is the code for KFold validation

skf = StratifiedKFold(n_splits=10, shuffle=True)
scores = np.zeros(10)
idx = 0
for index, (train_indices, val_indices) in enumerate(skf.split(X, y)):
    print("Training on fold " + str(index+1) + "/10...")
    # Generate batches from indices
    xtrain, xval = X[train_indices], X[val_indices]
    ytrain, yval = y[train_indices], y[val_indices]
    model = None
    model = load_model() //defined above

    scores[idx] = train_model(model, xtrain, ytrain, xval, yval)
    idx+=1
print(scores)
print(scores.mean())

I don't know what to do. I want to use Stratified K Fold on my model. Please help me.

642

asked Feb 26 '19 17:02

Sai Pavan

1 Answers

MultiLabelBinarizer returns a vector which is of the length of your number of classes.

If you look at how StratifiedKFold splits your dataset, you will see that it only accepts a one-dimensional target variable, whereas you are trying to pass a target variable with dimensions [n_samples, n_classes]

Stratefied split basically preserves your class distribution. And if you think about it, it does not make a lot of sense if you have a multi-label classification problem.

If you want to preserve the distribution in terms of the different combinations of classes in your target variable, then the answer here explains two ways in which you can define your own stratefied split function.

UPDATE:

The logic is something like this:

Assuming you have n classes and your target variable is a combination of these n classes. You will have (2^n) - 1 combinations (Not including all 0s). You can now create a new target variable considering each combination as a new label.

For example, if n=3, you will have 7 unique combinations:

 1. [1, 0, 0]
 2. [0, 1, 0]
 3. [0, 0, 1]
 4. [1, 1, 0]
 5. [1, 0, 1]
 6. [0, 1, 1]
 7. [1, 1, 1]

Map all your labels to this new target variable. You can now look at your problem as simple multi-class classification, instead of multi-label classification.

Now you can directly use StartefiedKFold using y_new as your target. Once the splits are done, you can map your labels back.

Code sample:

import numpy as np

np.random.seed(1)
y = np.random.randint(0, 2, (10, 7))
y = y[np.where(y.sum(axis=1) != 0)[0]]

OUTPUT:

array([[1, 1, 0, 0, 1, 1, 1],
       [1, 1, 0, 0, 1, 0, 1],
       [1, 0, 0, 1, 0, 0, 0],
       [1, 0, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 1, 1, 1],
       [1, 1, 0, 0, 0, 1, 1],
       [1, 1, 1, 1, 0, 1, 1],
       [0, 0, 1, 0, 0, 1, 1],
       [1, 0, 1, 0, 0, 1, 1],
       [0, 1, 1, 1, 1, 0, 0]])

Label encode your class vectors:

from sklearn.preprocessing import LabelEncoder

def get_new_labels(y):
    y_new = LabelEncoder().fit_transform([''.join(str(l)) for l in y])
    return y_new

y_new = get_new_labels(y)

OUTPUT:

array([7, 6, 3, 3, 2, 5, 8, 0, 4, 1])

172

answered Oct 10 '22 07:10

panktijk

Related questions
                            
                                How to calculate class weights of a Pandas DataFrame for Keras?
                            
                                Batch Normalization in tf.keras does not calculate average mean and average variance
                            
                                Why keras use "call" instead of __call__?
                            
                                What is causing large jumps in training accuracy and loss between epochs?
                            
                                How to visualize RNN/LSTM gradients in Keras/TensorFlow?
                            
                                Tensorflow: How to use tf.keras.metrics in multiclass classification?
                            
                                ValueError: name for name_scope must be a string when Trying to Build up a Model Class in TF2.0
                            
                                Keras: ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))
                            
                                TypeError: 'numpy.float64' object is not iterable Keras
                            
                                Why do tensorflow and keras SimpleRNN layers have a default activation of tanh
                            
                                How to use Keras to build a Part-of-Speech tagger?
                            
                                Keras How to use max_value in Relu activation function
                            
                                what is the meaning of border_mode in keras?
                            
                                keras usage of the Activation layer instead of activation parameter
                            
                                Multi-dimensional regression with Keras
                            
                                How to extract False Positive, False Negative from a confusion matrix of multiclass classification
                            
                                How to split a model trained in keras?
                            
                                What is the architecture behind the Keras LSTM Layer implementation?
                            
                                Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras
                            
                                Why am I receive AlreadyExistsError?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With