Having two different functions is a convenience, as they produce the same result.
The difference is simple:
sparse_softmax_cross_entropy_with_logits
, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1]
.softmax_cross_entropy_with_logits
, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.Labels used in softmax_cross_entropy_with_logits
are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits
.
Another tiny difference is that with sparse_softmax_cross_entropy_with_logits
, you can give -1 as a label to have loss 0
on this label.
I would just like to add 2 things to accepted answer that you can also find in TF documentation.
First:
tf.nn.softmax_cross_entropy_with_logits
NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.
Second:
tf.nn.sparse_softmax_cross_entropy_with_logits
NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry).
Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.
You can verify this by running the following program:
import tensorflow as tf
from random import randint
dims = 8
pos = randint(0, dims - 1)
logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)
res1 = tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))
with tf.Session() as sess:
a, b = sess.run([res1, res2])
print a, b
print a == b
Here I create a random logits
vector of length dims
and generate one-hot encoded labels (where element in pos
is 1 and others are 0).
After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With