Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow single sigmoid output with log loss vs two linear outputs with sparse softmax cross entropy loss for binary classification

I am experimenting with a binary classifier implementation in TensorFlow. If I have two plain outputs (i.e. no activation) in the final layer and use tf.losses.sparse_softmax_cross_entropy, my network trains as expected. However, if I change the output layer to produce a single output with a tf.sigmoid activation and use tf.losses.log_loss as the loss function, my network does not train (i.e. loss/accuracy does not improve).

Here is what my output layer/loss function looks like in the first (i.e. working) case:

out = tf.layers.dense(prev, 2)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=out)

In the second case, I have the following:

out = tf.layers.dense(prev, 1, activation=tf.sigmoid)
loss = tf.losses.log_loss(labels=y, predictions=out)

Tensor y is a vector of 0/1 values; it is not one-hot encoded. The network learns as expected in the first case, but not in the second case. Apart from these two lines, everything else is kept the same.

I do not understand why the second set-up does not work. Interestingly, if I express the same network in Keras and use the second set-up, it works. Am I using the wrong TensorFlow functions to express my intent in the second case? I'd like to produce a single sigmoid output and use binary cross-entropy loss to train a simple binary classifier.

I'm using Python 3.6 and TensorFlow 1.4.

Here is a small, runnable Python script to demonstrate the issue. Note that you need to have downloaded the StatOil/C-CORE dataset from Kaggle to be able to run the script as is.

Thanks!

like image 713
iheap Avatar asked Oct 28 '22 22:10

iheap


1 Answers

Using a sigmoid activation on two outputs doesn't give you a probability distribution:

import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

start = tf.constant([[4., 5.]])
out_dense = tf.layers.dense(start, units=2)
print("Logits (un-transformed)", out_dense)
out_sigmoid = tf.layers.dense(start, units=2, activation=tf.sigmoid)
print("Elementwise sigmoid", out_sigmoid)
out_softmax = tf.nn.softmax(tf.layers.dense(start, units=2))
print("Softmax (probability distribution)", out_softmax)

Prints:

Logits (un-transformed) tf.Tensor([[-3.64021587  6.90115976]], shape=(1, 2), dtype=float32)
Elementwise sigmoid tf.Tensor([[ 0.94315267  0.99705648]], shape=(1, 2), dtype=float32)
Softmax (probability distribution) tf.Tensor([[ 0.05623185  0.9437682 ]], shape=(1, 2), dtype=float32)

Instead of tf.nn.softmax, you could also use tf.sigmoid on a single logit, then set the other output to one minus that.

like image 146
Allen Lavoie Avatar answered Nov 15 '22 08:11

Allen Lavoie