Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to solve the XOR problem with just two hidden neurons in Python

I have a small, 3 layer, neural network with two input neurons, two hidden neurons and one output neuron. I am trying to stick to the below format of using only 2 hidden neurons.

enter image description here

I am trying to show how this can be used to behave as the XOR logic gate, however with just two hidden neurons I get the following poor output after 1,000,000 iterations!

Input: 0 0   Output:  [0.01039096]
Input: 1 0   Output:  [0.93708829]
Input: 0 1   Output:  [0.93599738]
Input: 1 1   Output:  [0.51917667]

If I use three hidden neurons I get a much better output with 100,000 iterations:

Input: 0 0   Output:  [0.01831612]
Input: 1 0   Output:  [0.98558057]
Input: 0 1   Output:  [0.98567602]
Input: 1 1   Output:  [0.02007876]

I am getting a decent output with 3 neurons in the hidden layer but not with two neurons in the hidden layer. Why?

As per a comment below, this repo contains code of high to solve the XOR problem using two hidden neurons.

I can't figure out what I am doing wrong. Any suggestions are appreciated! Attached is my code:

import numpy as np
import matplotlib
from matplotlib import pyplot as plt


# Sigmoid function
def sigmoid(x, deriv=False):
    if deriv:
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))


alpha = [0.7]

# Input dataset
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

# Output dataset
y = np.array([[0, 1, 1, 0]]).T

# seed random numbers to make calculation deterministic
np.random.seed(1)

# initialise weights randomly with mean 0
syn0 = 2 * np.random.random((2, 3)) - 1  # 1st layer of weights synapse 0 connecting L0 to L1
syn1 = 2 * np.random.random((3, 1)) - 1  # 2nd layer of weights synapse 0 connecting L1 to L2

# Randomize inputs for stochastic gradient descent
data = np.hstack((X, y))    # append Input and output dataset
np.random.shuffle(data)     # shuffle
x, y = np.array_split(data, 2, 1)    # Split along vertical(1) axis

for iter in range(100000):
    for i in range(4):
        # forward prop
        layer0 = x[i]  # Input layer
        layer1 = sigmoid(np.dot(layer0, syn0))  # Prediction step for layer 1
        layer2 = sigmoid(np.dot(layer1, syn1))  # Prediction step for layer 2

        layer2_error = y[i] - layer2  # Compare how well layer2's guess was with input

        layer2_delta = layer2_error * sigmoid(layer2, deriv=True)  # Error weighted derivative step

        if iter % 10000 == 0:
            print("Error: ", str(np.mean(np.abs(layer2_error))))
            plt.plot(iter, layer2_error, 'ro')


        # Uses "confidence weighted error" from l2 to establish an error for l1
        layer1_error = layer2_delta.dot(syn1.T)

        layer1_delta = layer1_error * sigmoid(layer1, deriv=True)  # Error weighted derivative step

        # Since SGD we need to dot product two 1D arrays. This is how.
        syn1 += (alpha * np.dot(layer1[:, None], layer2_delta[None, :]))  # Update weights
        syn0 += (alpha * np.dot(layer0[:, None], layer1_delta[None, :]))

    # Training was done above, below we re run to test algorithm

    layer0 = X  # Input layer
    layer1 = sigmoid(np.dot(layer0, syn0))  # Prediction step for layer 1
    layer2 = sigmoid(np.dot(layer1, syn1))  # Prediction step for layer 2


plt.show()
print("output after training: \n")
print("Input: 0 0 \t Output: ", layer2[0])
print("Input: 1 0 \t Output: ", layer2[1])
print("Input: 0 1 \t Output: ", layer2[2])
print("Input: 1 1 \t Output: ", layer2[3])
like image 249
rrz0 Avatar asked May 25 '19 20:05

rrz0


2 Answers

This is due to the fact that you have not considered any bias for the neurons. You have only used weights to try and fit the XOR model.

Incase of 2 neurons in the hidden layer, the network under-fits as it can't compensate for the bias.

When you use 3 neurons in the hidden layer, the extra neuron counters the effect caused due to the lack of bias.

This is an example of a network for XOR gate. You'll notice theta (bias) added to the hidden layers. This gives the network an additional parameter to tweak.

enter image description here

Additional resources

like image 92
skillsmuggler Avatar answered Oct 11 '22 00:10

skillsmuggler


It is an unsolvable equation system, that is why NN can not solve it either. While it may be an oversimplification, if we say the transfer function is linear, the expression becomes something like

z = (w1*x+w2*y)*w3 + (w4*x+w5*y)*w6

Then there are the 4 cases:

xy=00, z=0 = 0
xy=10, z=1 = w1*w3+w4*w6
xy=01, z=1 = w2*w3+w5*w6
xy=11, z=0 = (w1+w2)*w3 + (w4+w5)*w6

The problem is that

0 = (w1+w2)*w3 + (w4+w5)*w6 = w1*w3+w2*w3 + w4*w6+w5*w6            <-- xy=11 line
                            = w1*w3+w4*w6 + w2*w3+w5*w6 = 1+1 = 2  <-- xy=10 and xy=01 lines

So the seemingly 6 degrees of freedom are just not enough here, that is why you experience the need for adding something extra.

like image 33
tevemadar Avatar answered Oct 10 '22 23:10

tevemadar