I have a small, 3 layer, neural network with two input neurons, two hidden neurons and one output neuron. I am trying to stick to the below format of using only 2 hidden neurons.
I am trying to show how this can be used to behave as the XOR logic gate, however with just two hidden neurons I get the following poor output after 1,000,000 iterations!
Input: 0 0 Output: [0.01039096]
Input: 1 0 Output: [0.93708829]
Input: 0 1 Output: [0.93599738]
Input: 1 1 Output: [0.51917667]
If I use three hidden neurons I get a much better output with 100,000 iterations:
Input: 0 0 Output: [0.01831612]
Input: 1 0 Output: [0.98558057]
Input: 0 1 Output: [0.98567602]
Input: 1 1 Output: [0.02007876]
I am getting a decent output with 3 neurons in the hidden layer but not with two neurons in the hidden layer. Why?
As per a comment below, this repo contains code of high to solve the XOR problem using two hidden neurons.
I can't figure out what I am doing wrong. Any suggestions are appreciated! Attached is my code:
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
# Sigmoid function
def sigmoid(x, deriv=False):
if deriv:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
alpha = [0.7]
# Input dataset
X = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
# Output dataset
y = np.array([[0, 1, 1, 0]]).T
# seed random numbers to make calculation deterministic
np.random.seed(1)
# initialise weights randomly with mean 0
syn0 = 2 * np.random.random((2, 3)) - 1 # 1st layer of weights synapse 0 connecting L0 to L1
syn1 = 2 * np.random.random((3, 1)) - 1 # 2nd layer of weights synapse 0 connecting L1 to L2
# Randomize inputs for stochastic gradient descent
data = np.hstack((X, y)) # append Input and output dataset
np.random.shuffle(data) # shuffle
x, y = np.array_split(data, 2, 1) # Split along vertical(1) axis
for iter in range(100000):
for i in range(4):
# forward prop
layer0 = x[i] # Input layer
layer1 = sigmoid(np.dot(layer0, syn0)) # Prediction step for layer 1
layer2 = sigmoid(np.dot(layer1, syn1)) # Prediction step for layer 2
layer2_error = y[i] - layer2 # Compare how well layer2's guess was with input
layer2_delta = layer2_error * sigmoid(layer2, deriv=True) # Error weighted derivative step
if iter % 10000 == 0:
print("Error: ", str(np.mean(np.abs(layer2_error))))
plt.plot(iter, layer2_error, 'ro')
# Uses "confidence weighted error" from l2 to establish an error for l1
layer1_error = layer2_delta.dot(syn1.T)
layer1_delta = layer1_error * sigmoid(layer1, deriv=True) # Error weighted derivative step
# Since SGD we need to dot product two 1D arrays. This is how.
syn1 += (alpha * np.dot(layer1[:, None], layer2_delta[None, :])) # Update weights
syn0 += (alpha * np.dot(layer0[:, None], layer1_delta[None, :]))
# Training was done above, below we re run to test algorithm
layer0 = X # Input layer
layer1 = sigmoid(np.dot(layer0, syn0)) # Prediction step for layer 1
layer2 = sigmoid(np.dot(layer1, syn1)) # Prediction step for layer 2
plt.show()
print("output after training: \n")
print("Input: 0 0 \t Output: ", layer2[0])
print("Input: 1 0 \t Output: ", layer2[1])
print("Input: 0 1 \t Output: ", layer2[2])
print("Input: 1 1 \t Output: ", layer2[3])
This is due to the fact that you have not considered any bias
for the neurons.
You have only used weights to try and fit the XOR
model.
Incase of 2 neurons in the hidden layer, the network under-fits as it can't compensate for the bias.
When you use 3 neurons in the hidden layer, the extra neuron counters the effect caused due to the lack of bias.
This is an example of a network for XOR gate. You'll notice theta
(bias) added to the hidden layers. This gives the network an additional parameter to tweak.
Additional resources
It is an unsolvable equation system, that is why NN can not solve it either. While it may be an oversimplification, if we say the transfer function is linear, the expression becomes something like
z = (w1*x+w2*y)*w3 + (w4*x+w5*y)*w6
Then there are the 4 cases:
xy=00, z=0 = 0
xy=10, z=1 = w1*w3+w4*w6
xy=01, z=1 = w2*w3+w5*w6
xy=11, z=0 = (w1+w2)*w3 + (w4+w5)*w6
The problem is that
0 = (w1+w2)*w3 + (w4+w5)*w6 = w1*w3+w2*w3 + w4*w6+w5*w6 <-- xy=11 line
= w1*w3+w4*w6 + w2*w3+w5*w6 = 1+1 = 2 <-- xy=10 and xy=01 lines
So the seemingly 6 degrees of freedom are just not enough here, that is why you experience the need for adding something extra.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With