In keras backend we have a flag with_logits
in K.binary_crossentropy
. What is the difference between normal binary crossentropy and binary crossentropy with logits? Suppose I am using a seq2seq model and my output sequence is of type 100111100011101
.
What should I use for an recursive LSTM or RNN to learn from this data provided I am giving a similar sequence in the input along with timesteps?
This depends on whether or not you have a sigmoid layer just before the loss function.
If there is a sigmoid layer, it will squeeze the class scores into probabilities, in this case from_logits
should be False
. The loss function will transform the probabilities into logits, because that's what tf.nn.sigmoid_cross_entropy_with_logits
expects.
If the output is already a logit (i.e. the raw score), pass from_logits=True
, no transformation will be made.
Both options are possible and the choice depends on your network architecture. By the way if the term logit seems scary, take a look at this question which discusses it in detail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With