I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?
Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and categorical cross entropy as the loss function instead of mse and I understand more or less why.
There are two main methods for tackling a multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems, which can then be handled using single-class classifiers.
Multiclass Classification Neural Network using Adam Optimizer.
Modified Cross-Entropy loss for multi-label classification and handling imbalanced data.
Tensorflow has a loss function weighted_cross_entropy_with_logits
, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.
From the documentation:
This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
The argument pos_weight is used as a multiplier for the positive targets
If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):
import tensorflow as tf
import keras.backend.tensorflow_backend as tfb
POS_WEIGHT = 10 # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
"""
Weighted binary crossentropy between an output tensor
and a target tensor. POS_WEIGHT is used as a multiplier
for the positive targets.
Combination of the following functions:
* keras.losses.binary_crossentropy
* keras.backend.tensorflow_backend.binary_crossentropy
* tf.nn.weighted_cross_entropy_with_logits
"""
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
# compute weighted loss
loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
logits=output,
pos_weight=POS_WEIGHT)
return tf.reduce_mean(loss, axis=-1)
Then in your model:
model.compile(loss=weighted_binary_crossentropy, ...)
I have not found many resources yet which report well working values for the pos_weight
in relation to the number of classes, average active classes, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With