Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network for Imbalanced Multi-Class Multi-Label Classification

How to deal with mutli-label classification which has imbalanced results while training neural networks ? One of the solutions that I came across was to penalize the error for rare labeled classes. Here is what how i designed the network :

Number of classes: 100. Input layer, 1st hidden layer and 2nd layer(100) are fully-connected with drop-outs and ReLU. The output of the 2nd hidden layer is py_x.

cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=py_x, labels=Y))

Where Y is a modified version of one-hot-encoding with values between 1 to 5 set for all the labels of a sample. The value would ~1 for the most frequent label and ~5 for rarest labels. The value are not discrete, i.e new value to be set of a label in the one-hot-encoding is based on the formula

= 1 + 4*(1-(percentage of label/100))

For example: <0, 0, 1, 0, 1, .... > would be converted to something like <0, 0, 1.034, 0, 3.667, ...> . NOTE : only the values of 1 in the original vectors are changed.

This way if the model incorrectly predicts a rare label its error would be high, for ex: 0.0001 - 5 = -4.9999, and this would back-propagate a heavier error as compared to a mislabeling of very frequent label.

Is this the right way to penalize ? Are there any better methods to deal with this problem ?

like image 226
melwin_jose Avatar asked Apr 01 '17 03:04

melwin_jose


People also ask

How do you handle imbalance in multiclass classification?

Handling Imbalanced DatasetSelect random data from the minority class. Calculate the Euclidean distance between the random data and its k nearest neighbors. Multiply the difference with a random number between 0 and 1. Then, add the result to the minority class as a synthetic sample.

Can smote be used for multi-class?

The SMOTE implementation provided by imbalanced-learn , in python, can also be used for multi-class problems.

Which Optimizer is best for multi-label classification?

Multiclass Classification Neural Network using Adam Optimizer.

Is neural network good for Imbalanced data?

The imbalanced data problem exists in many real-world datasets. Neural networks are one popular method for classifying imbalanced data.


1 Answers

Let's answer your problem in the general form. What you are facing is the class imbalance problem and there are many ways to tackle this problem. Common ways are:

  1. Dataset Resampling: Make the classes balanced by changing the dataset size.
    For example, if you have 5 target classes(class A to E), and class A, B, C, and D have 1000 examples each and class E has 10 examples, you can simply add 990 more examples from class E(just copy it or copy and some noise to it).
  2. Cost-Sensitive Modeling: Change the importance(weight) of different classes.
    This is the method you have used in your code where you increased the importance(weight) of a class by a factor of at most 5.

Returning to your problem, the first solution is independent of your model. You just need to check if you are able to change the dataset(add more samples to classes with fewer samples or remove samples from classes with lots of samples). For the second solution, since you are working with a neural network, you have to change your loss function formula. You can define multiple hyperparameters(class weights or importance) and train your model and see which set of parameters work better.

So to answer your question, yes this is a right way to penalize but maybe you get better accuracy by trying different weights(instead of 5 in your example). Also, you might want to try dataset resampling.

For more information, you can refer to this link.

like image 89
Iman Mirzadeh Avatar answered Sep 22 '22 23:09

Iman Mirzadeh