Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need to use one_hot encoding if my output variable is binary?

I am developing a Tensorflow network based on their MNIST for beginners template. Basically, I am trying to implement a simple logistic regression in which 10 continuous variables predict a binary outcome, so my inputs are 10 values between 0 and 1, and my target variable (Y_train and Y_test in the code) is a 1 or 0.

My main problem is that there is no change in accuracy no matter how many training sets I run -- it is 0.276667 whether I run 100 or 31240 steps. Additionally, when I switch from the softmax to simply matmul to generate my Y values, I get 0.0 accuracy, which suggests there may be something wrong with my x*W + b calculation. The inputs read out just fine.

What I'm wondering is a) whether I'm not calculating Y values properly because of an error in my code and b) if that's not the case, is it possible that I need to implement the one_hot vectors -- even though my output already takes the form of 0 or 1. If the latter is the case, where do I include the one_hot=TRUE function in my generation of the target values vector? Thanks!

import numpy as np
import tensorflow as tf
train_data = np.genfromtxt("TRAINDATA2.txt", delimiter="    ")
train_input = train_data[:, :10]
train_input = train_input.reshape(31240, 10)
X_train = tf.placeholder(tf.float32, [31240, 10])

train_target = train_data[:, 10]
train_target = train_target.reshape(31240, 1)
Y_train = tf.placeholder(tf.float32, [31240, 1])

test_data = np.genfromtxt("TESTDATA2.txt", delimiter = "    ")
test_input = test_data[:, :10]
test_input = test_input.reshape(7800, 10)
X_test = tf.placeholder(tf.float32, [7800, 10])

test_target = test_data[:, 10]
test_target = test_target.reshape(7800, 1)
Y_test = tf.placeholder(tf.float32, [7800, 1])

W = tf.Variable(tf.zeros([10, 1]))
b = tf.Variable(tf.zeros([1]))

Y_obt = tf.nn.softmax(tf.matmul(X_train, W) + b)
Y_obt_test = tf.nn.softmax(tf.matmul(X_test, W) + b)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_obt, 
labels=Y_train)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(31240):
    sess.run(train_step, feed_dict={X_train: train_input, 
    Y_train:train_target})

correct_prediction = tf.equal(tf.round(Y_obt_test), Y_test)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={X_test : test_input, Y_test: 
test_target}))
like image 800
mudstick Avatar asked Aug 11 '17 18:08

mudstick


1 Answers

Since you map your values to a target with one element, you should not use softmax cross entropy, since the softmax operation transforms the input into a probability distribution, with the sum of all probabilities equal to 1. Since your target has only one element, it will simply output 1 everytime, since this is the only possible way to transform the input into a probability distribution. You should instead use tf.nn.sigmoid_cross_entropy_with_logits() (which is used for binary classification) and also remove the softmax from Y_obt and convert it into tf.sigmoid() for Y_obt_test.

Another way is to one-hot encode your targets and use a network with a two-element output. In this case, you should use tf.nn.softmax_cross_entropy_with_logits(), but remove the tf.nn.softmax() from Y_obt, since the softmax cross entropy expects unscaled logits (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). For the Y_obt_test, you should of course not remove it in this case.

Another thing: It might also help to take the mean of the cross entropies with cross_entropy = tf.reduce_mean(tf.sigmoid_cross_entropy_...).

like image 98
ml4294 Avatar answered Sep 27 '22 19:09

ml4294