Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XOR gate with a neural network

I was trying to implement an XOR gate with tensorflow. I succeeded in implementing that, but i don't fully understand why it works. I got help from stackoverflow posts here and here. So both with one hot true and without one hot true outputs. Here is the network as i understood, in order to set things clear.neural network visualization

My Question #1: Notice the RELU function and Sigmoid function. Why we need that(specifically the RELU function)? You may say that in order to achieve non linearity. I understand how RELU achieves non-linearity. I got the answer from here. Now from what I understand the difference between using RELU and without using RELU is this(see the picture).[I tested the tf.nn.relu function. The output is like this]

RELU

Now, if the first function works, why not the second function? From my perspective RELU achieves non-linearity by combining multiple linear functions. So both is linear function(upper two). If first one achieves non linearity, 2nd one should too, shouldn't it? The question is that, without using the RELU why the network gets stuck?

XOR gate with one hot true outputs

hidden1_neuron = 10

def Network(x, weights, bias):
    layer1 = tf.nn.relu(tf.matmul(x, weights['h1']) + bias['h1'])
    layer_final = tf.matmul(layer1, weights['out']) + bias['out']
    return layer_final

weight = {
    'h1' : tf.Variable(tf.random_normal([2, hidden1_neuron])),
    'out': tf.Variable(tf.random_normal([hidden1_neuron, 2]))
}
bias = {
    'h1' : tf.Variable(tf.random_normal([hidden1_neuron])),
    'out': tf.Variable(tf.random_normal([2]))
}

x = tf.placeholder(tf.float32, [None, 2])
y = tf.placeholder(tf.float32, [None, 2])

net = Network(x, weight, bias)

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(net, y)
loss = tf.reduce_mean(cross_entropy)

train_op = tf.train.AdamOptimizer(0.2).minimize(loss)

init_op = tf.initialize_all_variables()

xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])

with tf.Session() as sess:
    sess.run(init_op)
    for i in range(5000):
        train_data = sess.run(train_op, feed_dict={x: xTrain, y: yTrain})
        loss_val = sess.run(loss, feed_dict={x: xTrain, y: yTrain})
        if(not(i%500)):
            print(loss_val)

    result = sess.run(net, feed_dict={x:xTrain})
    print(result)

The code you see above implements the XOR gate with one hot true outputs. If i take out tf.nn.relu, the network gets stuck. Why?

My Question #2: How can I understand if a network is going to get stuck on some local minima[or some value]? Is it from the plot of cost function (or loss function)? Say, for the network designed above, I used cross entropy as the loss function. I could not find the plotting of cross entropy function. (If you can provide this, this would be very helpful.)

My Question #3: Notice on the code there is a line hidden1_neuron = 10. It means that i have set the number of neurons in the hidden layer 10. Reducing the number of neurons to 5 makes the network to get stuck. So what should be the number of neurons on hidden layer?

The output when the network works the way it is supposed to :

2.42076
0.000456363
0.000149548
7.40216e-05
4.34194e-05
2.78939e-05
1.8924e-05
1.33214e-05
9.62602e-06
7.06308e-06
[[ 7.5128479  -7.58900356]
 [-5.65254211  5.28509617]
 [-6.96340656  6.62380219]
 [ 7.26610374 -5.9665451 ]]

The output when the network gets stuck:

1.45679
0.346579
0.346575
0.346575
0.346574
0.346574
0.346574
0.346574
0.346574
0.346574
[[ 15.70696926 -18.21559143]
 [ -7.1562047    9.75774956]
 [ -0.03214722  -0.03214724]
 [ -0.03214722  -0.03214724]]
like image 660
Shubhashis Avatar asked Jan 01 '16 14:01

Shubhashis


1 Answers

Question 1

Both the ReLU and Sigmoid function is non-linear. On the contrary, the function drawn to the right of the ReLU function is linear. Applying multiple linear activation functions will still make the network linear.

Therefore, the network gets stuck when trying to perform linear regression on a non-linear problem.

Question 2

Yes, you will have to pay attention to the progression of the error rate. In larger problem instances, you would typically pay attention to the development of the error function on your test set. This is done by measuring the accuracy of the network after a period of training.

Question 3

The XOR problem requires at least 2 input, 2 hidden, and 1 output node, that is: five nodes are required to correctly model the XOR problem with a simple neural network.

like image 79
jorgenkg Avatar answered Oct 26 '22 00:10

jorgenkg