I'm learning keras API in tensorflow(2.3). In this guide on tensorflow website, I found an example of custom loss funciton:
def custom_mean_squared_error(y_true, y_pred):
return tf.math.reduce_mean(tf.square(y_true - y_pred))
The reduce_mean
function in this custom loss function will return an scalar.
Is it right to define loss function like this? As far as I know, the first dimension of the shapes of y_true
and y_pred
is the batch size. I think the loss function should return loss values for every sample in the batch. So the loss function shoud give an array of shape (batch_size,)
. But the above function gives a single value for the whole batch.
Maybe the above example is wrong? Could anyone give me some help on this problem?
p.s. Why do I think the loss function should return an array rather than a single value?
I read the source code of Model class. When you provide a loss function (please note it's a function, not a loss class) to Model.compile()
method, ths loss function is used to construct a LossesContainer
object, which is stored in Model.compiled_loss
. This loss function passed to the constructor of LossesContainer
class is used once again to construct a LossFunctionWrapper
object, which is stored in LossesContainer._losses
.
According to the source code of LossFunctionWrapper class, the overall loss value for a training batch is calculated by the LossFunctionWrapper.__call__()
method (inherited from Loss
class), i.e. it returns a single loss value for the whole batch. But the LossFunctionWrapper.__call__()
first calls the LossFunctionWrapper.call()
method to obtain an array of losses for every sample in the training batch. Then these losses are fianlly averaged to get the single loss value for the whole batch. It's in the LossFunctionWrapper.call()
method that the loss function provided to the Model.compile()
method is called.
That's why I think the custom loss funciton should return an array of losses, insead of a single scalar value. Besides, if we write a custom Loss
class for the Model.compile()
method, the call()
method of our custom Loss
class should also return an array, rather than a signal value.
I opened an issue on github. It's confirmed that custom loss function is required to return one loss value per sample. The example will need to be updated to reflect this.
Actually, as far as I know, the shape of return value of the loss function is not important, i.e. it could be a scalar tensor or a tensor of one or multiple values per sample. The important thing is how it should be reduced to a scalar value so that it could be used in optimization process or shown to the user. For that, you can check the reduction types in Reduction
documentation.
Further, here is what the compile
method documentation says about the loss
argument, partially addressing this point:
loss: String (name of objective function), objective function or
tf.keras.losses.Loss
instance. Seetf.keras.losses
. An objective function is any callable with the signatureloss = fn(y_true,y_pred)
, wherey_true
= ground truth values with shape =[batch_size, d0, .. dN]
, except sparse loss functions such as sparse categorical crossentropy where shape =[batch_size, d0, .. dN-1]
.y_pred
= predicted values with shape =[batch_size, d0, .. dN]
. It returns a weighted loss float tensor. If a customLoss
instance is used and reduction is set toNONE
, return value has the shape[batch_size, d0, .. dN-1]
ie. per-sample or per-timestep loss values; otherwise, it is a scalar. If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses.
In addition, it's worth noting that most of the built-in loss functions in TF/Keras are usually reduced over the last dimension (i.e. axis=-1
).
For those who doubt that a custom loss function which returns a scalar value would work: you can run the following snippet and you will see that the model would train and converge properly.
import tensorflow as tf
import numpy as np
def custom_loss(y_true, y_pred):
return tf.reduce_sum(tf.square(y_true - y_pred))
inp = tf.keras.layers.Input(shape=(3,))
out = tf.keras.layers.Dense(3)(inp)
model = tf.keras.Model(inp, out)
model.compile(loss=custom_loss, optimizer=tf.keras.optimizers.Adam(lr=0.1))
x = np.random.rand(1000, 3)
y = x * 10 + 2.5
model.fit(x, y, epochs=20)
I opened an issue on github. It's confirmed that custom loss function is required to return one loss value per sample. The example will need to be updated to reflect this.
I think the question posted by @Gödel is totally legit and is correct. The custom loss function should return a loss value per sample. And, an explanation provided by @today is also correct. In the end, it all depends on the kind of reduction used.
So if one uses class API to create a loss function, then, reduction parameter is automatically inherited in the custom class. Its default value "sum_over_batch_size" is used (which is simply averaging of all the loss values in a given batch). Other options are "sum", which computes a sum instead of averaging and the last option is "none", where an array of loss values are returned.
It is also mentioned in the Keras documentation that these differences in reduction are irreverent when one is using model.fit()
because reduction is then automatically handled by TF/Keras.
And, lastly, it is also mentioned that when a custom loss function is created, then, an array of losses (individual sample losses) should be returned. Their reduction is handled by the framework.
Links:
The tf.math.reduce_mean
takes the average for the batch and returns it. That's why it is a scalar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With