Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Tensorflow, what is the difference between sampled_softmax_loss and softmax_cross_entropy_with_logits

In tensorflow, there are methods called softmax_cross_entropy_with_logits and sampled_softmax_loss.

I read the tensorflow document and searched google for more information but I couldn't find the difference. It looks like to me both calculates the loss using softmax function.

Using sampled_softmax_loss to calculate the loss

loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(...))

Using softmax_cross_entropy_with_logits to calculate the loss

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(P, Q))

To me, calculating softmax loss is same as calculating softmaxed cross entropy (e.g. cross_entropy(softmax(train_x)))

Could somebody tell me the why there is two different methods and which method should I use in which case?

like image 931
HongKun Yoo Avatar asked Feb 06 '16 12:02

HongKun Yoo


2 Answers

If your target vocabulary(or in other words amount of classes you want to predict) is really big, it is very hard to use regular softmax, because you have to calculate probability for every word in dictionary. By Using sampled_softmax_loss you only take in account subset V of your vocabulary to calculate your loss.

Sampled softmax only makes sense if we sample(our V) less than vocabulary size. If your vocabulary(amount of labels) is small, there is no point using sampled_softmax_loss.

You can see implementation details in this paper: http://arxiv.org/pdf/1412.2007v2.pdf

Also you can see example where it is used - Sequence to sequence translation in this example

like image 56
Farseer Avatar answered Oct 01 '22 20:10

Farseer


Sampled:

Sampled, in both cases means you don't calculate it for all of what's possible as an output (e.g.: if there are too many words in a dictionary to take all of them at each derivation, so we take just a few samples and learn on that for NLP problems).

softmax_cross_entropy_with_logits:

This is the cross entropy and receives logits as inputs and yields what can be used as a loss.

sampled_softmax_loss:

This is a sampled softmax_cross_entropy_with_logits, so it takes just a few samples before using the cross entropy rather than using the full cross entropy: https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/python/ops/nn_impl.py#L1269

like image 43
Guillaume Chevalier Avatar answered Oct 01 '22 19:10

Guillaume Chevalier