Can someone tell me mathematically how sample_weight and class_weight are used in Keras in the calculation of loss function and metrics? A simple mathematical express will be great.
As mentioned in the Keras Official Docs, class_weight : Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
Class weights give all the classes equal importance on gradient updates, on average, regardless of how many samples we have from each class in the training data. This prevents models from predicting the more frequent class more often just because it's more common.
It is a simple multiplication. The loss contributed by the sample is magnified by its sample weight. Assuming i = 1 to n
samples, a weight vector of sample weights w
of length n
, and that the loss for sample i
is denoted L_i
:
In Keras in particular, the product of each sample's loss with its weight is divided by the fraction of weights that are not 0 such that the loss per batch is proportional to the number of weight > 0 samples. Let p
be the proportion of non-zero weights.
Here's the relevant snippet of code from the Keras repo:
score_array = loss_fn(y_true, y_pred)
if weights is not None:
score_array *= weights
score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
return K.mean(score_array)
class_weight
is used in the same way as sample_weight
; it is just provided as a convenience to specify certain weights across entire classes.
The sample weights are currently not applied to metrics, only loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With