Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Imbalanced classes in multi-class classification problem

I'm trying to use TensorFlow's DNNClassifier for my multi-class (softmax) classification problem with 4 different classes. I have an imbalanced dataset with the following distribution:

  • Class 0: 14.8%
  • Class 1: 35.2%
  • Class 2: 27.8%
  • Class 3: 22.2%

How do I assign the weights for the DNNClassifier's weight_column for each class? I know how to code this, but I am wondering what values should I give for each class.

like image 606
wieus Avatar asked Sep 18 '18 10:09

wieus


People also ask

How do you handle imbalanced multiclass classification?

Handling Imbalanced DatasetSelect random data from the minority class. Calculate the Euclidean distance between the random data and its k nearest neighbors. Multiply the difference with a random number between 0 and 1. Then, add the result to the minority class as a synthetic sample.

What is the problem with imbalanced datasets in classification problems?

Imbalanced classification is specifically hard because of the severely skewed class distribution and the unequal misclassification costs. The difficulty of imbalanced classification is compounded by properties such as dataset size, label noise, and data distribution.

What is imbalanced class classification problem?

Many practical classification problems are imbalanced. The class imbalance problem typically occurs when there are many more instances of some classes than others. In such cases, standard classifiers tend to be overwhelmed by the large classes and ignore the small ones.


1 Answers

you can try the following formula to balanced all classes:

weight_for_class_X = total_samples_size / size_of_class_X / num_classes

for exampe:

num_CLASS_0: 10000   
num_CLASS_1: 1000
num_CLASS_2: 100

wgt_for_0 = 11100 / 10000 / 3 = 0.37  
wgt_for_1 = 11100 / 1000 / 3 = 3.7
wgt_for_2 = 11100 / 100 / 3 = 37

# so after one epoch training the total weights of each class will be:
total_wgt_of_0 = 0.37 * 10000 = 3700
total_wgt_of_1 = 3.7 * 1000 = 3700
total_wgt_of_2 = 37 * 100 = 3700
like image 154
FelixHo Avatar answered Sep 28 '22 11:09

FelixHo