Tensorflow single sigmoid output with log loss vs two linear outputs with sparse softmax cross entropy loss for binary classification

Question

I am experimenting with a binary classifier implementation in TensorFlow. If I have two plain outputs (i.e. no activation) in the final layer and use tf.losses.sparse_softmax_cross_entropy, my network trains as expected. However, if I change the output layer to produce a single output with a tf.sigmoid activation and use tf.losses.log_loss as the loss function, my network does not train (i.e. loss/accuracy does not improve).

Here is what my output layer/loss function looks like in the first (i.e. working) case:

out = tf.layers.dense(prev, 2)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=out)

In the second case, I have the following:

out = tf.layers.dense(prev, 1, activation=tf.sigmoid)
loss = tf.losses.log_loss(labels=y, predictions=out)

Tensor y is a vector of 0/1 values; it is not one-hot encoded. The network learns as expected in the first case, but not in the second case. Apart from these two lines, everything else is kept the same.

I do not understand why the second set-up does not work. Interestingly, if I express the same network in Keras and use the second set-up, it works. Am I using the wrong TensorFlow functions to express my intent in the second case? I'd like to produce a single sigmoid output and use binary cross-entropy loss to train a simple binary classifier.

I'm using Python 3.6 and TensorFlow 1.4.

Here is a small, runnable Python script to demonstrate the issue. Note that you need to have downloaded the StatOil/C-CORE dataset from Kaggle to be able to run the script as is.

Thanks!

Allen Lavoie · Accepted Answer

Using a sigmoid activation on two outputs doesn't give you a probability distribution:

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

start = tf.constant([[4., 5.]])
out_dense = tf.layers.dense(start, units=2)
print("Logits (un-transformed)", out_dense)
out_sigmoid = tf.layers.dense(start, units=2, activation=tf.sigmoid)
print("Elementwise sigmoid", out_sigmoid)
out_softmax = tf.nn.softmax(tf.layers.dense(start, units=2))
print("Softmax (probability distribution)", out_softmax)

Prints:

Logits (un-transformed) tf.Tensor([[-3.64021587  6.90115976]], shape=(1, 2), dtype=float32)
Elementwise sigmoid tf.Tensor([[ 0.94315267  0.99705648]], shape=(1, 2), dtype=float32)
Softmax (probability distribution) tf.Tensor([[ 0.05623185  0.9437682 ]], shape=(1, 2), dtype=float32)

Instead of tf.nn.softmax, you could also use tf.sigmoid on a single logit, then set the other output to one minus that.

Tensorflow single sigmoid output with log loss vs two linear outputs with sparse softmax cross entropy loss for binary classification

Tags:

machine-learning

tensorflow

classification

iheap

1 Answers

Allen Lavoie

Recent Activity

Donate For Us

Tensorflow single sigmoid output with log loss vs two linear outputs with sparse softmax cross entropy loss for binary classification

Tags:

machine-learning

tensorflow

classification

iheap

1 Answers

Allen Lavoie

Related questions

Recent Activity

Donate For Us