Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training MSE loss larger than theoretical maximum?

I am training a keras model whose last layer is a single sigmoid unit:

output = Dense(units=1, activation='sigmoid')

I am training this model with some training data in which the expected output is always a number between 0.0 and 1.0. I am compiling the model with mean-squared-error:

model.compile(optimizer='adam', loss='mse')

Since both the expected output and the real output are single floats between 0 and 1, I was expecting a loss between 0 and 1 as well, but when I start the training I get a loss of 3.3932, larger than 1.

Am I missing something?

Edit: I am adding an example to show the problem: https://drive.google.com/file/d/1fBBrgW-HlBYhG-BUARjTXn3SpWqrHHPK/view?usp=sharing (I cannot just paste the code because I need to attach the training data)

After running python stackoverflow.py, the summary of the model will be shown, as well as the training process. I also print the minimum and maximum values of y_true each step to verify that they are within the [0, 1] range. There is no need to wait for the training to finish, you will see that the loss during the first few epochs is much larger than 1.

like image 826
oooliverrr Avatar asked Aug 30 '20 08:08

oooliverrr


1 Answers

First, we can demystify mse loss - it's a normal callable function in tf.keras:

import tensorflow as tf
import numpy as np

mse = tf.keras.losses.mse
print(mse([1] * 3, [0] * 3))  # tf.Tensor(1, shape=(), dtype=int32)

Next, as the name "mean squared error" implies, it's a mean, meaning size of vectors passed to it do not change the value as long as the mean is the same:

print(mse([1] * 10, [0] * 10)) # tf.Tensor(1, shape=(), dtype=int32)

In order for the mse to exceed 1, average error must exceed 1:

print( mse(np.random.random((100,)), np.random.random((100,))) )  # tf.Tensor(0.14863832582680103, shape=(), dtype=float64)
print( mse( 10 * np.random.random((100,)), np.random.random((100,))) )  # tf.Tensor(30.51209646429651, shape=(), dtype=float64)

Lastly, sigmoid indeed guarantees that output is between 0 and 1:

sigmoid = tf.keras.activations.sigmoid
signal = 10 * np.random.random((100,))

output = sigmoid(signal)
print(f"Raw: {np.mean(signal):.2f}; Sigmoid: {np.mean(output):.2f}" )  # Raw: 5.35; Sigmoid: 0.92

What this implies is that in your code, mean of y_true is NOT between 0 and 1.

You can verify this with np.mean(y_true).

like image 139
ikamen Avatar answered Sep 22 '22 09:09

ikamen