Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems implementing an XOR gate with Neural Nets in Tensorflow

I want to make a trivial neural network, it should just implement the XOR gate. I am using the TensorFlow library, in python. For an XOR gate, the only data I train with, is the complete truth table, that should be enough right? Over optimization is what I will expect to happen very quickly. Problem with the code is that the weights and biases do not update. Somehow it still gives me 100% accuracy with zero for the biases and weights.

x = tf.placeholder("float", [None, 2])
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,1])


print "Done init"

cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.75).minimize(cross_entropy)

print "Done loading vars"

init = tf.initialize_all_variables()
print "Done: Initializing variables"

sess = tf.Session()
sess.run(init)
print "Done: Session started"

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1], [0], [0], [0]])


acc=0.0
while acc<0.85:
  for i in range(500):
      sess.run(train_step, feed_dict={x: xTrain, y_: yTrain})


  print b.eval(sess)
  print W.eval(sess)


  print "Done training"


  correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

  accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

  print "Result:"
  acc= sess.run(accuracy, feed_dict={x: xTrain, y_: yTrain})
  print acc

B0 = b.eval(sess)[0]
B1 = b.eval(sess)[1]
W00 = W.eval(sess)[0][0]
W01 = W.eval(sess)[0][1]
W10 = W.eval(sess)[1][0]
W11 = W.eval(sess)[1][1]

for A,B in product([0,1],[0,1]):
  top = W00*A + W01*A + B0
  bottom = W10*B + W11*B + B1
  print "A:",A," B:",B
  # print "Top",top," Bottom: ", bottom
  print "Sum:",top+bottom

I am following the tutorial from http://tensorflow.org/tutorials/mnist/beginners/index.md#softmax_regressions and in the final for-loop I am printing the results form the matrix(as described in the link).

Can anybody point out my error and what I should do to fix it?

like image 689
Cristian F Avatar asked Nov 17 '15 01:11

Cristian F


People also ask

Why neural networks can not be used to solve the XOR problem?

The XOr problem is that we need to build a Neural Network (a perceptron in our case) to produce the truth table related to the XOr logical operator. This is a binary classification problem. Hence, supervised learning is a better way to solve it. In this case, we will be using perceptrons.

Can XOR be implemented using single neuron?

It was first pointed out by Papert and Minsky that a single neuron cannot learn the XOR function since a single hyperplane (line in this case) cannot separate the output classes for this function definition.

How the XOR problem could be solved using MLP?

MLP solves the XOR problem efficiently by visualizing the data points in multi-dimensions and thus constructing an n-variable equation to fit in the output values.

Why a single perceptron Cannot represent XOR gate?

Geometrically, this means the perceptron can separate its input space with a hyperplane. That's where the notion that a perceptron can only separate linearly separable problems came from. Since the XOR function is not linearly separable, it really is impossible for a single hyperplane to separate it.


1 Answers

There are a few issues with your program.

The first issue is that the function you're learning isn't XOR - it's NOR. The lines:

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1], [0], [0], [0]])

...should be:

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[0], [1], [1], [0]])

The next big issue is that the network you've designed isn't capable of learning XOR. You'll need to use a non-linear function (such as tf.nn.relu() and define at least one more layer to learn the XOR function. For example:

x = tf.placeholder("float", [None, 2])
W_hidden = tf.Variable(...)
b_hidden = tf.Variable(...)
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)

W_logits = tf.Variable(...)
b_logits = tf.Variable(...)
logits = tf.matmul(hidden, W_logits) + b_logits

A further issue is that initializing the weights to zero will prevent your network from training. Typically, you should initialize your weights randomly, and your biases to zero. Here's one popular way to do it:

HIDDEN_NODES = 2

W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES], stddev=1./math.sqrt(2)))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))

W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2], stddev=1./math.sqrt(HIDDEN_NODES)))
b_logits = tf.Variable(tf.zeros([2]))

Putting it all together, and using TensorFlow routines for cross-entropy (with a one-hot encoding of yTrain for convenience), here's a program that learns XOR:

import math
import tensorflow as tf
import numpy as np

HIDDEN_NODES = 10

x = tf.placeholder(tf.float32, [None, 2])
W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES], stddev=1./math.sqrt(2)))
b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES]))
hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden)

W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2], stddev=1./math.sqrt(HIDDEN_NODES)))
b_logits = tf.Variable(tf.zeros([2]))
logits = tf.matmul(hidden, W_logits) + b_logits

y = tf.nn.softmax(logits)

y_input = tf.placeholder(tf.float32, [None, 2])

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_input)
loss = tf.reduce_mean(cross_entropy)

train_op = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

init_op = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init_op)

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])

for i in xrange(500):
  _, loss_val = sess.run([train_op, loss], feed_dict={x: xTrain, y_input: yTrain})

  if i % 10 == 0:
    print "Step:", i, "Current loss:", loss_val
    for x_input in [[0, 0], [0, 1], [1, 0], [1, 1]]:
      print x_input, sess.run(y, feed_dict={x: [x_input]})

Note that this is probably not the most efficient neural network for computing XOR, so suggestions for tweaking the parameters are welcome!

like image 145
mrry Avatar answered Oct 14 '22 04:10

mrry