I have written the following binary classification program in tensorflow that is buggy. The cost is returning to be zero all the time no matter what the input is. I am trying to debug a larger program which is not learning anything from the data. I have narrowed down at least one bug to the cost function always returning zero. The given program is using some random inputs and is having the same problem. self.X_train
and self.y_train
is originally supposed to read from files and the function self.predict()
has more layers forming a feedforward neural network.
import numpy as np
import tensorflow as tf
class annClassifier():
def __init__(self):
with tf.variable_scope("Input"):
self.X = tf.placeholder(tf.float32, shape=(100, 11))
with tf.variable_scope("Output"):
self.y = tf.placeholder(tf.float32, shape=(100, 1))
self.X_train = np.random.rand(100, 11)
self.y_train = np.random.randint(0,2, size=(100, 1))
def predict(self):
with tf.variable_scope('OutputLayer'):
weights = tf.get_variable(name='weights',
shape=[11, 1],
initializer=tf.contrib.layers.xavier_initializer())
bases = tf.get_variable(name='bases',
shape=[1],
initializer=tf.zeros_initializer())
final_output = tf.matmul(self.X, weights) + bases
return final_output
def train(self):
prediction = self.predict()
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=self.y))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(cost, feed_dict={self.X:self.X_train, self.y:self.y_train}))
with tf.Graph().as_default():
classifier = annClassifier()
classifier.train()
If someone could please figure out what I am doing wrong in this, I can try making the same change in my original program. Thanks a lot!
The only problem is invalid cost used. softmax_cross_entropy_with_logits
should be used if you have more than two classes, as softmax of a single output always returns 1, as it is defined as :
softmax(x)_i = exp(x_i) / SUM_j exp(x_j)
so for a single number (one dimensional output)
softmax(x) = exp(x) / exp(x) = 1
Furthermore, for softmax output TF expects one-hot encoded labels, so if you provide only 0 or 1, there are two possibilities:
-0*log(1) = 0
-1*log(1) = 0
Tensorflow has a separate function to handle binary classification which applies sigmoid instead (note, that the same function for more than one output would apply sigmoid independently on each dimension which is what multi-label classification would expect):
tf.sigmoid_cross_entropy_with_logits
just switch to this cost and you are good to go, you do not have to encode anything as one-hot anymore either, as this function is designed solely to be used for your use-case.
The only missing bit is that .... your code does not have actual training routine you need to define optimiser, ask it to minimise a loss and then run a train op in the loop. In your current setting you just try to predict over and over, with the network which never changes.
In particular, please refer to Cross Entropy Jungle question on SO which provides more detailed description of all these different helper functions in TF (and other libraries), which have different requirements/use cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With