I am trying to apply deep learning for a binary classification problem with high class imbalance between target classes (500k, 31K). I want to write a custom loss function which should be like: minimize(100-((predicted_smallerclass)/(total_smallerclass))*100)
Appreciate any pointers on how I can build this logic.
A Loss Function Suitable for Class Imbalanced Data: “Focal Loss”
A widely adopted technique for dealing with highly unbalanced datasets is called resampling. It consists of removing samples from the majority class (under-sampling) and/or adding more examples from the minority class (over-sampling).
Binary classificationFocal Loss can be interpreted as a binary cross-entropy function multiplied by a modulating factor (1- pₜ)^γ which reduces the contribution of easy-to-classify samples. The weighting factor aₜ balances the modulating factor.
You can add class weights to the loss function, by multiplying logits. Regular cross entropy loss is this:
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j]))) = -x[class] + log(\sum_j exp(x[j]))
in weighted case:
loss(x, class) = weights[class] * -x[class] + log(\sum_j exp(weights[class] * x[j]))
So by multiplying logits, you are re-scaling predictions of each class by its class weight.
For example:
ratio = 31.0 / (500.0 + 31.0) class_weight = tf.constant([ratio, 1.0 - ratio]) logits = ... # shape [batch_size, 2] weighted_logits = tf.mul(logits, class_weight) # shape [batch_size, 2] xent = tf.nn.softmax_cross_entropy_with_logits( weighted_logits, labels, name="xent_raw")
There is a standard losses function now that supports weights per batch:
tf.losses.sparse_softmax_cross_entropy(labels=label, logits=logits, weights=weights)
Where weights should be transformed from class weights to a weight per example (with shape [batch_size]). See documentation here.
The code you proposed seems wrong to me. The loss should be multiplied by the weight, I agree.
But if you multiply the logit by the class weights, you end with:
weights[class] * -x[class] + log( \sum_j exp(x[j] * weights[class]) )
The second term is not equal to:
weights[class] * log(\sum_j exp(x[j]))
To show this, we can be rewrite the latter as:
log( (\sum_j exp(x[j]) ^ weights[class] )
So here is the code I'm proposing:
ratio = 31.0 / (500.0 + 31.0) class_weight = tf.constant([[ratio, 1.0 - ratio]]) logits = ... # shape [batch_size, 2] weight_per_label = tf.transpose( tf.matmul(labels , tf.transpose(class_weight)) ) #shape [1, batch_size] # this is the weight for each datapoint, depending on its label xent = tf.mul(weight_per_label , tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="xent_raw") #shape [1, batch_size] loss = tf.reduce_mean(xent) #shape 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With