Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: What exact formula is applied in `tf.nn.sparse_softmax_cross_entropy_with_logits`?

Tags:

tensorflow

I tried to manually recompute the outputs of this function so I created a minimal example:

logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)

softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]

Now according to my own calculation I only get [1.23] When manually calculating, I'm simply applying softmax

enter image description here

and cross-entropy:

enter image description here

where q(x) = sigma(x_j) or (1-sigma(x_j)) depending whether j is the correct ground truth class or not and p(x) = labels which are then one-hot-encoded

I'm not sure where the difference might originate from. I cannot really imagine that some epsilon causes such a big difference. Does someone know where I can lookup, which exact formula is used by tensorflow? Is the source code of that exact part available?
I could only find nn_ops.py, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits which I couldn't find on github...

like image 802
mcExchange Avatar asked Apr 13 '17 13:04

mcExchange


1 Answers

Well, usually p(x) in cross-entropy equation is true distribution, while q(x) is the distribution obtained from softmax. So, if p(x) is one-hot (and this is so, otherwise sparse cross-entropy could not be applied), cross entropy is just negative log for probability of true category.

In your example, softmax(logits) is a vector with values [0.09003057, 0.24472847, 0.66524096], so the loss is -log(0.24472847) = 1.4076059 which is exactly what you got as output.

like image 93
Dmitriy Danevskiy Avatar answered Nov 06 '22 04:11

Dmitriy Danevskiy