Having two different functions is a convenience, as they produce the same result. The difference is simple: <ul> <li>For <code>sparse_softmax_cross_entropy_with_logits</code>, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range <code>[0, num_classes-1]</code>.</li> <li>For <code>softmax_cross_entropy_with_logits</code>, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.</li> </ul> Labels used in <code>softmax_cross_entropy_with_logits</code> are the one hot version of labels used in <code>sparse_softmax_cross_entropy_with_logits</code>. Another tiny difference is that with <code>sparse_softmax_cross_entropy_with_logits</code>, you can give -1 as a label to have loss <code>0</code> on this label. I would just like to add 2 things to accepted answer that you can also find in TF documentation. First: <blockquote> tf.nn.softmax_cross_entropy_with_logits NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect. </blockquote> Second: <blockquote> tf.nn.sparse_softmax_cross_entropy_with_logits NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry). </blockquote> Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding. You can verify this by running the following program: <pre class="prettyprint"><code>import tensorflow as tf from random import randint dims = 8 pos = randint(0, dims - 1) logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32) labels = tf.one_hot(pos, dims) res1 = tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=labels) res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos)) with tf.Session() as sess: a, b = sess.run([res1, res2]) print a, b print a == b </code></pre> Here I create a random <code>logits</code> vector of length <code>dims</code> and generate one-hot encoded labels (where element in <code>pos</code> is 1 and others are 0). After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output

What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Tags:

neural-network

tensorflow

cross-entropy

softmax

Having two different functions is a convenience, as they produce the same result.

The difference is simple:

For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.

Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.

I would just like to add 2 things to accepted answer that you can also find in TF documentation.

First:

tf.nn.softmax_cross_entropy_with_logits

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

Second:

tf.nn.sparse_softmax_cross_entropy_with_logits

NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry).

Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.

You can verify this by running the following program:

import tensorflow as tf
from random import randint

dims = 8
pos  = randint(0, dims - 1)

logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)

res1 = tf.nn.softmax_cross_entropy_with_logits(       logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))

with tf.Session() as sess:
    a, b = sess.run([res1, res2])
    print a, b
    print a == b

Here I create a random logits vector of length dims and generate one-hot encoded labels (where element in pos is 1 and others are 0).

After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output

Related questions
                            
                                How does Keras handle multilabel classification?
                            
                                How to update the bias in neural network backpropagation?
                            
                                Differences between numpy.random.rand vs numpy.random.randn in Python
                            
                                What's the difference between a bidirectional LSTM and an LSTM?
                            
                                How to tell Keras stop training based on loss value?
                            
                                How to assign a value to a TensorFlow variable?
                            
                                How to implement the ReLU function in Numpy
                            
                                pytorch - connection between loss.backward() and optimizer.step()
                            
                                keras: how to save the training history attribute of the history object
                            
                                How to choose cross-entropy loss in TensorFlow?
                            
                                How to fix RuntimeError "Expected object of scalar type Float but got scalar type Double for argument"?
                            
                                How to add regularizations in TensorFlow?
                            
                                What is the role of TimeDistributed layer in Keras?
                            
                                Common causes of nans during training
                            
                                NaN loss when training regression network
                            
                                Should we do learning rate decay for adam optimizer
                            
                                How to concatenate two layers in keras?
                            
                                multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer? [closed]
                            
                                What are some good resources for learning about Artificial Neural Networks? [closed]
                            
                                Why should weights of Neural Networks be initialized to random numbers? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With