Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras mean squared error loss layer

I am currently implementing a custom loss layer and in the process, I stumbled upon the implementation of mean squared error in the objectives.py file [1]. I know I'm missing something in my understanding of this loss calculation because I always thought that the average was done separately across the samples for each output in each mini-batch (axis 0 of the tensor) but it appears that the average is actually being done across the last axis, which in a single vector, would mean it's being done across the outputs. I found this by accident while working on my custom loss layer because it requires discounting the loss of a few of the outputs it a training output in a specific place is a specific value. Anyways, is my understanding of the mean squared error incorrect? Why would Keras be using the last axis and thus turning a a 1xn output vector into a 1x1 output vector?

Thanks.

[1] https://github.com/fchollet/keras/blob/master/keras/objectives.py#L7

like image 391
Corey J. Nolet Avatar asked Jan 17 '17 21:01

Corey J. Nolet


People also ask

What is loss =' Sparse_categorical_crossentropy?

sparse_categorical_crossentropy: Used as a loss function for multi-class classification model where the output label is assigned integer value (0, 1, 2, 3…). This loss function is mathematically same as the categorical_crossentropy. It just has a different interface.

Is MSE a good loss function?

Mean squared error (MSE) is the most commonly used loss function for regression. The loss is the mean overseen data of the squared differences between true and predicted values, or writing it as a formula.

How is MSE loss calculated?

How do you calculate mean squared error loss? Mean squared error (MSE) loss is calculated by taking the difference between `y` and our prediction, then square those values. We take these new numbers (square them), add all of that together to get a final value, finally divide this number by y again.


2 Answers

The code in question for the MSE loss is this:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

Here first y_pred and y_true are subtracted, then that result is passed to K.square, which as expected, returns the square of its parameter, and then that result is given to K.mean, which computes the mean.

So the code clearly is doing what its supposed to do. About why the last axis is operated upon, this has nothing to do with classes, it is just a convention. Note that in general, there are no classes in the MSE definition.

like image 99
Dr. Snoopy Avatar answered Sep 28 '22 15:09

Dr. Snoopy


Let's detail the steps of how the losses are computed in Keras to show that the axis=-1 in all the loss computations are correct :

  • So we pick a loss in losses.py that we will pass to the compile method of our model.

  • In compile, the total loss is computed. It happens in several steps : The first step creates a list of losses, one for each output of the model.

  • This first step calls _weighted_masked_objective which according to the docs 'Adds support for masking and sample-weighting to an objective function'
  • Basically, _weighted_masked_objective returns a new objective functions which take into account the weights and mask parameters which the user will provide when using the method fit.

If I cut the code to have only the lines that matter for the question, we get to something like that.

def _weighted_masked_objective(fn):
    def weighted(y_true, y_pred, weights, mask=None):
          score_array = fn(y_true, y_pred) # Compute loss as in losses.py
          return K.mean(score_array) # Average over all axis

class Model(Container):
    def compile(self, optimizer, loss, metrics=None, loss_weights=None,
                sample_weight_mode=None, weighted_metrics=None,
                target_tensors=None, **kwargs):
        weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]

So at the end, the loss is indeed averaged over every dimension, and the use of axis=-1 is just an elegant way to enable masking and weighting of the loss at another point in the code

NB : I didn't explain the other steps because they don't contribute to answering the question.

like image 30
mpariente Avatar answered Sep 28 '22 16:09

mpariente