Neural network for multi label classification with large number of classes outputs only zero

Tags:

I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?

872

asked Feb 10 '17 11:02

Yakku

1 Answers

Tensorflow has a loss function weighted_cross_entropy_with_logits, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.

From the documentation:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.

The argument pos_weight is used as a multiplier for the positive targets

If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):

import tensorflow as tf
import keras.backend.tensorflow_backend as tfb

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned

def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor 
    and a target tensor. POS_WEIGHT is used as a multiplier 
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output / (1 - output))
    # compute weighted loss
    loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
                                                    logits=output,
                                                    pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model:

model.compile(loss=weighted_binary_crossentropy, ...)

I have not found many resources yet which report well working values for the pos_weight in relation to the number of classes, average active classes, etc.

answered Oct 04 '22 10:10

tobigue

Related questions
                            
                                User analysis based on their facebook profile?
                            
                                Is there a way to set up a multi-hidden layer neural network with the mlp method in the caret package?
                            
                                Mnist recognition using keras
                            
                                Implementing im2col in TensorFlow
                            
                                Keras - Add attention mechanism to an LSTM model [duplicate]
                            
                                Custom combined hinge/kb-divergence loss function in siamese-net fails to generate meaningful speaker-embeddings
                            
                                In scikit learn, how to deal with the data mixed with numerical and nominal value?
                            
                                Azure Machine Learning - CORS
                            
                                Getting reproducible results using tensorflow-gpu
                            
                                What is imbalance in image segmentation?
                            
                                Is there a keras method to split data?
                            
                                What type of neural network can handle variable input and output sizes?
                            
                                inputs for nDCG in sklearn
                            
                                What are the differences between airflow and Kubeflow pipeline?
                            
                                How to fix ROC curve with points below diagonal?
                            
                                normalization methods for stream data
                            
                                Prediction is depending on the batch size in Keras
                            
                                PyTorch multiprocessing error with Hogwild
                            
                                predict_proba or decision_function as estimator "confidence"
                            
                                Keras: Use the same layer in different models (share weights)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural network for multi label classification with large number of classes outputs only zero

Tags:

machine-learning

neural-network

classification

keras

Yakku

People also ask

1 Answers

tobigue

Recent Activity

Donate For Us