I'm trying to use TensorFlow's DNNClassifier for my multi-class (softmax) classification problem with 4 different classes. I have an imbalanced dataset with the following distribution:
How do I assign the weights for the DNNClassifier's weight_column
for each class? I know how to code this, but I am wondering what values should I give for each class.
Handling Imbalanced DatasetSelect random data from the minority class. Calculate the Euclidean distance between the random data and its k nearest neighbors. Multiply the difference with a random number between 0 and 1. Then, add the result to the minority class as a synthetic sample.
Imbalanced classification is specifically hard because of the severely skewed class distribution and the unequal misclassification costs. The difficulty of imbalanced classification is compounded by properties such as dataset size, label noise, and data distribution.
Many practical classification problems are imbalanced. The class imbalance problem typically occurs when there are many more instances of some classes than others. In such cases, standard classifiers tend to be overwhelmed by the large classes and ignore the small ones.
you can try the following formula to balanced all classes:
weight_for_class_X = total_samples_size / size_of_class_X / num_classes
for exampe:
num_CLASS_0: 10000
num_CLASS_1: 1000
num_CLASS_2: 100
wgt_for_0 = 11100 / 10000 / 3 = 0.37
wgt_for_1 = 11100 / 1000 / 3 = 3.7
wgt_for_2 = 11100 / 100 / 3 = 37
# so after one epoch training the total weights of each class will be:
total_wgt_of_0 = 0.37 * 10000 = 3700
total_wgt_of_1 = 3.7 * 1000 = 3700
total_wgt_of_2 = 37 * 100 = 3700
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With