I am trying to adapt this MNIST example to binary classification.
But when changing my NLABELS
from NLABELS=2
to NLABELS=1
, the loss function always returns 0 (and accuracy 1).
from __future__ import absolute_import from __future__ import division from __future__ import print_function from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf # Import data mnist = input_data.read_data_sets('data', one_hot=True) NLABELS = 2 sess = tf.InteractiveSession() # Create the model x = tf.placeholder(tf.float32, [None, 784], name='x-input') W = tf.Variable(tf.zeros([784, NLABELS]), name='weights') b = tf.Variable(tf.zeros([NLABELS], name='bias')) y = tf.nn.softmax(tf.matmul(x, W) + b) # Add summary ops to collect data _ = tf.histogram_summary('weights', W) _ = tf.histogram_summary('biases', b) _ = tf.histogram_summary('y', y) # Define loss and optimizer y_ = tf.placeholder(tf.float32, [None, NLABELS], name='y-input') # More name scopes will clean up the graph representation with tf.name_scope('cross_entropy'): cross_entropy = -tf.reduce_mean(y_ * tf.log(y)) _ = tf.scalar_summary('cross entropy', cross_entropy) with tf.name_scope('train'): train_step = tf.train.GradientDescentOptimizer(10.).minimize(cross_entropy) with tf.name_scope('test'): correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) _ = tf.scalar_summary('accuracy', accuracy) # Merge all the summaries and write them out to /tmp/mnist_logs merged = tf.merge_all_summaries() writer = tf.train.SummaryWriter('logs', sess.graph_def) tf.initialize_all_variables().run() # Train the model, and feed in test data and record summaries every 10 steps for i in range(1000): if i % 10 == 0: # Record summary data and the accuracy labels = mnist.test.labels[:, 0:NLABELS] feed = {x: mnist.test.images, y_: labels} result = sess.run([merged, accuracy, cross_entropy], feed_dict=feed) summary_str = result[0] acc = result[1] loss = result[2] writer.add_summary(summary_str, i) print('Accuracy at step %s: %s - loss: %f' % (i, acc, loss)) else: batch_xs, batch_ys = mnist.train.next_batch(100) batch_ys = batch_ys[:, 0:NLABELS] feed = {x: batch_xs, y_: batch_ys} sess.run(train_step, feed_dict=feed)
I have checked the dimensions of both batch_ys
(fed into y
) and _y
and they are both 1xN matrices when NLABELS=1
so the problem seems to be prior to that. Maybe something to do with the matrix multiplication?
I actually have got this same problem in a real project, so any help would be appreciated... Thanks!
Deep learning can be used for binary classification, too. In fact, building a neural network that acts as a binary classifier is little different than building one that acts as a regressor.
The use of a single Sigmoid/Logistic neuron in the output layer is the mainstay of a binary classification neural network. This is because the output of a Sigmoid/Logistic function can be conveniently interpreted as the estimated probability(p̂, pronounced p-hat) that the given input belongs to the “positive” class.
The original MNIST example uses a one-hot encoding to represent the labels in the data: this means that if there are NLABELS = 10
classes (as in MNIST), the target output is [1 0 0 0 0 0 0 0 0 0]
for class 0, [0 1 0 0 0 0 0 0 0 0]
for class 1, etc. The tf.nn.softmax()
operator converts the logits computed by tf.matmul(x, W) + b
into a probability distribution across the different output classes, which is then compared to the fed-in value for y_
.
If NLABELS = 1
, this acts as if there were only a single class, and the tf.nn.softmax()
op would compute a probability of 1.0
for that class, leading to a cross-entropy of 0.0
, since tf.log(1.0)
is 0.0
for all of the examples.
There are (at least) two approaches you could try for binary classification:
The simplest would be to set NLABELS = 2
for the two possible classes, and encode your training data as [1 0]
for label 0 and [0 1]
for label 1. This answer has a suggestion for how to do that.
You could keep the labels as integers 0
and 1
and use tf.nn.sparse_softmax_cross_entropy_with_logits()
, as suggested in this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With