Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is loss exactly?

I'm training the im2txt sample of tensorflow and it outputs the loss:

INFO:tensorflow:global step 2174: loss = 3.6930 (15.83 sec/step)
INFO:tensorflow:global step 2175: loss = 3.6651 (15.52 sec/step)
INFO:tensorflow:global step 2176: loss = 3.5733 (18.25 sec/step)
INFO:tensorflow:global step 2177: loss = 3.1979 (18.87 sec/step)
INFO:tensorflow:global step 2178: loss = 2.9362 (15.99 sec/step)
INFO:tensorflow:global step 2179: loss = 3.6375 (15.65 sec/step)

What is loss? How does it relate to the AI:s probability to perform correctly (is there a formula)? What is ususally an acceptable loss?

like image 585
Himmators Avatar asked Feb 06 '17 06:02

Himmators


People also ask

What does loss mean in ai?

That is, loss is a number indicating how bad the model's prediction was on a single example. If the model's prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples.

What is loss in lstm?

compile(optimizer = 'adam', loss = 'mean_squared_error') That is the loss your LSTM model is minimizing. The Mean Squared Error, or MSE, loss is the default loss to use for regression problems. Mean squared error is calculated as the average of the squared differences between the predicted and actual values.

What does loss mean in Tensorflow?

We use a loss function to determine how far the predicted values deviate from the actual values in the training data. We change the model weights to make the loss minimum, and that is what training is all about.

What does a loss function return?

Binary Cross Entropy The sum reduction means that the loss function will return the sum of the per-sample losses in the batch. bce = tf.keras.losses.BinaryCrossentropy(reduction='sum') bce(y_true, y_pred).numpy() Using the reduction as none returns the full array of the per-sample losses.


2 Answers

From: https://github.com/tensorflow/models/blob/master/im2txt/im2txt/show_and_tell_model.py

  losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=targets,
                                                          logits=logits)
  batch_loss = tf.div(tf.reduce_sum(tf.multiply(losses, weights)),
                      tf.reduce_sum(weights),
                      name="batch_loss")

softmax is basically a fancy max function that is derivable (you can lookup the exact definition in the docs). It's going to have high values for the largest activations. It can have multiple high activation and this is going to get penalized for all the wrong ones.

Loss is the the thing that you want the model to optimize down.

This doesn't usually mean much unless you've trained lots of similar models with the exact same loss. Usually you look at the loss graph to see when the model stopped making progress so that you can stop training. Also keep in mind that for other models you will want to add other things to the loss that you might want to optimize (say some input weights that you want to average to 1.0) that will bump up the loss but that doesn't mean the model is worse.

If you want to figure out if your model is good or bad add metrics for the things you care about. The obvious things are precision/recall/accuracy. There's predefined metrics already that you can use (streaming_accuracy). Alternatively you can compute the metric and add it as a summary but that's not going to be available from the eval dataset.

Another option is to setup a model that's obviously bad (constant or random) and compare the loss of that model with what you are getting.

like image 186
Sorin Avatar answered Oct 13 '22 18:10

Sorin


Loss is the target function that the optimization algorithm will try to minimize.

In general, you want your loss function to be a measure of how bad your model is. But because the optimization algorithms require a few mathematical properties to work nicely, you can't pick the usual stuff like precision and recall (you want continuous functions that are differentiable in relation to the model parameters).

With classification tasks, softmax is a common choice. It's a smooth and well-behaved version of argmax, which is used to pick the class with highest network activation. With regression, the usual mean squared error serves fine.

like image 31
villasv Avatar answered Oct 13 '22 17:10

villasv