Cost function always returning zero for a binary classification in tensorflow

Question

I have written the following binary classification program in tensorflow that is buggy. The cost is returning to be zero all the time no matter what the input is. I am trying to debug a larger program which is not learning anything from the data. I have narrowed down at least one bug to the cost function always returning zero. The given program is using some random inputs and is having the same problem. self.X_train and self.y_train is originally supposed to read from files and the function self.predict() has more layers forming a feedforward neural network.

import numpy as np
import tensorflow as tf

class annClassifier():

    def __init__(self):

        with tf.variable_scope("Input"):
             self.X = tf.placeholder(tf.float32, shape=(100, 11))

        with tf.variable_scope("Output"):
            self.y = tf.placeholder(tf.float32, shape=(100, 1))

        self.X_train = np.random.rand(100, 11)
        self.y_train = np.random.randint(0,2, size=(100, 1))

    def predict(self):

        with tf.variable_scope('OutputLayer'):
            weights = tf.get_variable(name='weights',
                                      shape=[11, 1],
                                      initializer=tf.contrib.layers.xavier_initializer())
            bases = tf.get_variable(name='bases',
                                    shape=[1],
                                    initializer=tf.zeros_initializer())
            final_output = tf.matmul(self.X, weights) + bases

        return final_output

    def train(self):

        prediction = self.predict()
        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=self.y))

        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())         
            print(sess.run(cost, feed_dict={self.X:self.X_train, self.y:self.y_train}))


with tf.Graph().as_default():
    classifier = annClassifier()
    classifier.train()

If someone could please figure out what I am doing wrong in this, I can try making the same change in my original program. Thanks a lot!

lejlot · Accepted Answer

The only problem is invalid cost used. softmax_cross_entropy_with_logits should be used if you have more than two classes, as softmax of a single output always returns 1, as it is defined as :

softmax(x)_i = exp(x_i) / SUM_j exp(x_j)

so for a single number (one dimensional output)

softmax(x) = exp(x) / exp(x) = 1

Furthermore, for softmax output TF expects one-hot encoded labels, so if you provide only 0 or 1, there are two possibilities:

True label is 0, so the cost is -0*log(1) = 0
True label is 1, so the cost is -1*log(1) = 0

Tensorflow has a separate function to handle binary classification which applies sigmoid instead (note, that the same function for more than one output would apply sigmoid independently on each dimension which is what multi-label classification would expect):

tf.sigmoid_cross_entropy_with_logits

just switch to this cost and you are good to go, you do not have to encode anything as one-hot anymore either, as this function is designed solely to be used for your use-case.

The only missing bit is that .... your code does not have actual training routine you need to define optimiser, ask it to minimise a loss and then run a train op in the loop. In your current setting you just try to predict over and over, with the network which never changes.

In particular, please refer to Cross Entropy Jungle question on SO which provides more detailed description of all these different helper functions in TF (and other libraries), which have different requirements/use cases.

Cost function always returning zero for a binary classification in tensorflow

Tags:

python

artificial-intelligence

machine-learning

neural-network

tensorflow

Ananda

1 Answers

lejlot

Recent Activity

Donate For Us

Cost function always returning zero for a binary classification in tensorflow

Tags:

python

artificial-intelligence

machine-learning

neural-network

tensorflow

Ananda

1 Answers

lejlot

Related questions

Recent Activity

Donate For Us