I am experimenting with a binary classifier implementation in TensorFlow. If I have two plain outputs (i.e. no activation) in the final layer and use tf.losses.sparse_softmax_cross_entropy
, my network trains as expected. However, if I change the output layer to produce a single output with a tf.sigmoid
activation and use tf.losses.log_loss
as the loss function, my network does not train (i.e. loss/accuracy does not improve).
Here is what my output layer/loss function looks like in the first (i.e. working) case:
out = tf.layers.dense(prev, 2)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=out)
In the second case, I have the following:
out = tf.layers.dense(prev, 1, activation=tf.sigmoid)
loss = tf.losses.log_loss(labels=y, predictions=out)
Tensor y
is a vector of 0
/1
values; it is not one-hot encoded. The network learns as expected in the first case, but not in the second case. Apart from these two lines, everything else is kept the same.
I do not understand why the second set-up does not work. Interestingly, if I express the same network in Keras and use the second set-up, it works. Am I using the wrong TensorFlow functions to express my intent in the second case? I'd like to produce a single sigmoid output and use binary cross-entropy loss to train a simple binary classifier.
I'm using Python 3.6 and TensorFlow 1.4.
Here is a small, runnable Python script to demonstrate the issue. Note that you need to have downloaded the StatOil/C-CORE dataset from Kaggle to be able to run the script as is.
Thanks!
Using a sigmoid
activation on two outputs doesn't give you a probability distribution:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
start = tf.constant([[4., 5.]])
out_dense = tf.layers.dense(start, units=2)
print("Logits (un-transformed)", out_dense)
out_sigmoid = tf.layers.dense(start, units=2, activation=tf.sigmoid)
print("Elementwise sigmoid", out_sigmoid)
out_softmax = tf.nn.softmax(tf.layers.dense(start, units=2))
print("Softmax (probability distribution)", out_softmax)
Prints:
Logits (un-transformed) tf.Tensor([[-3.64021587 6.90115976]], shape=(1, 2), dtype=float32)
Elementwise sigmoid tf.Tensor([[ 0.94315267 0.99705648]], shape=(1, 2), dtype=float32)
Softmax (probability distribution) tf.Tensor([[ 0.05623185 0.9437682 ]], shape=(1, 2), dtype=float32)
Instead of tf.nn.softmax
, you could also use tf.sigmoid
on a single logit, then set the other output to one minus that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With