Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between binary crossentropy and binary crossentropy with logits in keras?

In keras backend we have a flag with_logits in K.binary_crossentropy. What is the difference between normal binary crossentropy and binary crossentropy with logits? Suppose I am using a seq2seq model and my output sequence is of type 100111100011101.

What should I use for an recursive LSTM or RNN to learn from this data provided I am giving a similar sequence in the input along with timesteps?

like image 865
Subham Mukherjee Avatar asked Oct 30 '22 01:10

Subham Mukherjee


1 Answers

This depends on whether or not you have a sigmoid layer just before the loss function.

If there is a sigmoid layer, it will squeeze the class scores into probabilities, in this case from_logits should be False. The loss function will transform the probabilities into logits, because that's what tf.nn.sigmoid_cross_entropy_with_logits expects.

If the output is already a logit (i.e. the raw score), pass from_logits=True, no transformation will be made.

Both options are possible and the choice depends on your network architecture. By the way if the term logit seems scary, take a look at this question which discusses it in detail.

like image 103
Maxim Avatar answered Nov 15 '22 07:11

Maxim