I tried to manually recompute the outputs of this function so I created a minimal example:
logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)
softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]
Now according to my own calculation I only get [1.23] When manually calculating, I'm simply applying softmax
and cross-entropy:
where q(x) = sigma(x_j) or (1-sigma(x_j))
depending whether j is the correct ground truth class or not and p(x) = labels
which are then one-hot-encoded
I'm not sure where the difference might originate from. I cannot really imagine that some epsilon causes such a big difference. Does someone know where I can lookup, which exact formula is used by tensorflow?
Is the source code of that exact part available?
I could only find nn_ops.py
, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits
which I couldn't find on github...
Well, usually p(x)
in cross-entropy equation is true distribution, while q(x)
is the distribution obtained from softmax. So, if p(x)
is one-hot (and this is so, otherwise sparse cross-entropy could not be applied), cross entropy is just negative log for probability of true category.
In your example, softmax(logits)
is a vector with values [0.09003057, 0.24472847, 0.66524096]
, so the loss is -log(0.24472847) = 1.4076059
which is exactly what you got as output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With