Unbalanced data and weighted cross entropy

Tags:

I'm trying to train a network with an unbalanced data. I have A (198 samples), B (436 samples), C (710 samples), D (272 samples) and I have read about the "weighted_cross_entropy_with_logits" but all the examples I found are for binary classification so I'm not very confident in how to set those weights.

Total samples: 1616

A_weight: 198/1616 = 0.12?

The idea behind, if I understood, is to penalize the errors of the majority class and value more positively the hits in the minority one, right?

My piece of code:

weights = tf.constant([0.12, 0.26, 0.43, 0.17]) cost = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=pred, targets=y, pos_weight=weights))

I have read this one and others examples with binary classification but still not very clear.

Thanks in advance.

976

asked Jun 15 '17 06:06

Sergiodiaz53

2 Answers

Note that weighted_cross_entropy_with_logits is the weighted variant of sigmoid_cross_entropy_with_logits. Sigmoid cross entropy is typically used for binary classification. Yes, it can handle multiple labels, but sigmoid cross entropy basically makes a (binary) decision on each of them -- for example, for a face recognition net, those (not mutually exclusive) labels could be "Does the subject wear glasses?", "Is the subject female?", etc.

In binary classification(s), each output channel corresponds to a binary (soft) decision. Therefore, the weighting needs to happen within the computation of the loss. This is what weighted_cross_entropy_with_logits does, by weighting one term of the cross-entropy over the other.

In mutually exclusive multilabel classification, we use softmax_cross_entropy_with_logits, which behaves differently: each output channel corresponds to the score of a class candidate. The decision comes after, by comparing the respective outputs of each channel.

Weighting in before the final decision is therefore a simple matter of modifying the scores before comparing them, typically by multiplication with weights. For example, for a ternary classification task,

# your class weights class_weights = tf.constant([[1.0, 2.0, 3.0]]) # deduce weights for batch samples based on their true label weights = tf.reduce_sum(class_weights * onehot_labels, axis=1) # compute your (unweighted) softmax cross entropy loss unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(onehot_labels, logits) # apply the weights, relying on broadcasting of the multiplication weighted_losses = unweighted_losses * weights # reduce the result to get your final loss loss = tf.reduce_mean(weighted_losses)

You could also rely on tf.losses.softmax_cross_entropy to handle the last three steps.

In your case, where you need to tackle data imbalance, the class weights could indeed be inversely proportional to their frequency in your train data. Normalizing them so that they sum up to one or to the number of classes also makes sense.

Note that in the above, we penalized the loss based on the true label of the samples. We could also have penalized the loss based on the estimated labels by simply defining

weights = class_weights

and the rest of the code need not change thanks to broadcasting magic.

In the general case, you would want weights that depend on the kind of error you make. In other words, for each pair of labels X and Y, you could choose how to penalize choosing label X when the true label is Y. You end up with a whole prior weight matrix, which results in weights above being a full (num_samples, num_classes) tensor. This goes a bit beyond what you want, but it might be useful to know nonetheless that only your definition of the weight tensor need to change in the code above.

170

answered Oct 19 '22 23:10

P-Gn

See this answer for an alternate solution which works with sparse_softmax_cross_entropy:

import  tensorflow as tf import numpy as np  np.random.seed(123) sess = tf.InteractiveSession()  # let's say we have the logits and labels of a batch of size 6 with 5 classes logits = tf.constant(np.random.randint(0, 10, 30).reshape(6, 5), dtype=tf.float32) labels = tf.constant(np.random.randint(0, 5, 6), dtype=tf.int32)  # specify some class weightings class_weights = tf.constant([0.3, 0.1, 0.2, 0.3, 0.1])  # specify the weights for each sample in the batch (without having to compute the onehot label matrix) weights = tf.gather(class_weights, labels)  # compute the loss tf.losses.sparse_softmax_cross_entropy(labels, logits, weights).eval()

answered Oct 20 '22 01:10

DankMasterDan

Related questions
                            
                                virtualenvwrapper functions unavailable in shell scripts
                            
                                Django: How to build a custom form widget?
                            
                                pytz localize vs datetime replace
                            
                                Why are underscores better than hyphens for file names?
                            
                                pip install PIL fails
                            
                                How can I get the href of elements found by partial link text?
                            
                                RuntimeError: working outside of application context
                            
                                How to use 'yield' inside async function?
                            
                                How does Python sort a list of tuples?
                            
                                What is generator.throw() good for?
                            
                                PyTorch: How to get the shape of a Tensor as a list of int
                            
                                jupyterlab interactive plot
                            
                                What errors/exceptions do I need to handle with urllib2.Request / urlopen?
                            
                                Is there a possibility to execute a Python script while being in interactive mode
                            
                                Test case execution order in pytest
                            
                                How to use valgrind with python?
                            
                                TypeError: only length-1 arrays can be converted to Python scalars while plot showing
                            
                                @staticmethod with @property
                            
                                Create stacked histogram from unequal length arrays
                            
                                Why is Flask application not creating any logs when hosted by Gunicorn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unbalanced data and weighted cross entropy

Tags:

python

machine-learning

tensorflow

deep-learning

Sergiodiaz53

People also ask

2 Answers

P-Gn

DankMasterDan

Recent Activity

Donate For Us