I'm training the im2txt sample of tensorflow and it outputs the loss:
INFO:tensorflow:global step 2174: loss = 3.6930 (15.83 sec/step)
INFO:tensorflow:global step 2175: loss = 3.6651 (15.52 sec/step)
INFO:tensorflow:global step 2176: loss = 3.5733 (18.25 sec/step)
INFO:tensorflow:global step 2177: loss = 3.1979 (18.87 sec/step)
INFO:tensorflow:global step 2178: loss = 2.9362 (15.99 sec/step)
INFO:tensorflow:global step 2179: loss = 3.6375 (15.65 sec/step)
What is loss? How does it relate to the AI:s probability to perform correctly (is there a formula)? What is ususally an acceptable loss?
That is, loss is a number indicating how bad the model's prediction was on a single example. If the model's prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples.
compile(optimizer = 'adam', loss = 'mean_squared_error') That is the loss your LSTM model is minimizing. The Mean Squared Error, or MSE, loss is the default loss to use for regression problems. Mean squared error is calculated as the average of the squared differences between the predicted and actual values.
We use a loss function to determine how far the predicted values deviate from the actual values in the training data. We change the model weights to make the loss minimum, and that is what training is all about.
Binary Cross Entropy The sum reduction means that the loss function will return the sum of the per-sample losses in the batch. bce = tf.keras.losses.BinaryCrossentropy(reduction='sum') bce(y_true, y_pred).numpy() Using the reduction as none returns the full array of the per-sample losses.
From: https://github.com/tensorflow/models/blob/master/im2txt/im2txt/show_and_tell_model.py
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=targets,
logits=logits)
batch_loss = tf.div(tf.reduce_sum(tf.multiply(losses, weights)),
tf.reduce_sum(weights),
name="batch_loss")
softmax
is basically a fancy max function that is derivable (you can lookup the exact definition in the docs). It's going to have high values for the largest activations. It can have multiple high activation and this is going to get penalized for all the wrong ones.
Loss is the the thing that you want the model to optimize down.
This doesn't usually mean much unless you've trained lots of similar models with the exact same loss. Usually you look at the loss graph to see when the model stopped making progress so that you can stop training. Also keep in mind that for other models you will want to add other things to the loss that you might want to optimize (say some input weights that you want to average to 1.0) that will bump up the loss but that doesn't mean the model is worse.
If you want to figure out if your model is good or bad add metrics for the things you care about. The obvious things are precision/recall/accuracy. There's predefined metrics already that you can use (streaming_accuracy). Alternatively you can compute the metric and add it as a summary but that's not going to be available from the eval dataset.
Another option is to setup a model that's obviously bad (constant or random) and compare the loss of that model with what you are getting.
Loss is the target function that the optimization algorithm will try to minimize.
In general, you want your loss function to be a measure of how bad your model is. But because the optimization algorithms require a few mathematical properties to work nicely, you can't pick the usual stuff like precision and recall (you want continuous functions that are differentiable in relation to the model parameters).
With classification tasks, softmax
is a common choice. It's a smooth and well-behaved version of argmax
, which is used to pick the class with highest network activation. With regression, the usual mean squared error
serves fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With