XOR Neural Network Converges to 0.5

Tags:

I've implemented the following neural network to solve the XOR problem in Python. My neural network consists of an input layer of 2 neurons, 1 hidden layer of 2 neurons and an output layer of 1 neuron. I am using the Sigmoid function as the activation function for the hidden layer and the linear (identity) function as the activation function for the output layer:

import numpy as np

def sigmoid(z):
    return 1/(1+np.exp(-z))

def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))

def init_weights(layers, epsilon):
    weights = []
    for i in range(len(layers)-1):
        w = np.random.rand(layers[i+1], layers[i]+1)
        w = w * 2*epsilon - epsilon
        weights.append(np.mat(w))
    return weights

def fit(X, Y, w, predict=False, x=None):
    w_grad = ([np.mat(np.zeros(np.shape(w[i]))) 
              for i in range(len(w))])
    for i in range(len(X)):
        x = x if predict else X[0]
        y = Y[0,i]
        # forward propagate
        a = x
        a_s = []
        for j in range(len(w)):
            a = np.mat(np.append(1, a)).T
            a_s.append(a)
            z = w[j] * a
            a = sigmoid(z)
        if predict: return a
        # backpropagate
        delta = a - y.T
        w_grad[-1] += delta * a_s[-1].T
        for j in reversed(range(1, len(w))):
            delta = np.multiply(w[j].T*delta, s_prime(a_s[j]))
            w_grad[j-1] += (delta[1:] * a_s[j-1].T)
    return [w_grad[i]/len(X) for i in range(len(w))]

def predict(x):
    return fit(X, Y, w, True, x)

####

X = np.mat([[0,0],
            [0,1],
            [1,0],
            [1,1]])
Y = np.mat([0,1,1,0])
layers = [2,2,1]
epochs = 10000
alpha = 0.5
w = init_weights(layers, 1)

for i in range(epochs):
    w_grad = fit(X, Y, w)
    print w_grad
    for j in range(len(w)):
        w[j] -= alpha * w_grad[j]

for i in range(len(X)):
    x = X[i]
    guess = predict(x)
    print x, ":", guess

The backpropagation seems to all be correct; the only issue that comes to mind would be some problem with my implementation of the bias units. Either way, all predications for each input converge to approximately 0.5 each time I run the code. I've scoured the code and can't seem to find what's wrong. Can anyone point what's wrong with my implementation? I appreciate any feedback.

If for any reason it might help, here's the kind of output I'm getting:

[[0 0]] : [[ 0.5]]
[[0 1]] : [[ 0.49483673]]
[[1 0]] : [[ 0.52006739]]
[[1 1]] : [[ 0.51610963]]

283

asked Apr 02 '16 04:04

sam

1 Answers

Your implementation of forward and backpropagation is more or less correct. However, where you're going wrong is quite simple. The first small error is to look inside your fit function - specifically the first statement inside your for loop:

x = x if predict else X[0]

You are saying that if you aren't predicting (i.e. performing training), the input example chosen during each iteration of Stochastic Gradient Descent must always be the first example, which is [0 0] (i.e. X[0]). This is the reason why you are getting 0.5 for all of your predictions because you are only training using the first input. You need to change this so that it reads the correct example, which is example i:

x = x if predict else X[i]

The last change you need to make is your s_prime function. The derivative of the sigmoid function is indeed what you have there:

def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))

When you calculate the forward propagation, you have already computed the output activations of each neuron in a_s, so when you compute the local derivative at these neurons, you supply the output activations directly to s_prime so there is no need for you to compute the sigmoid of these again.

Therefore:

def s_prime(z):
    return np.multiply(z, 1.0-z)

Once I made these two changes, we now get this output:

[[0 0]] : [[ 0.00239857]]
[[0 1]] : [[ 0.99816778]]
[[1 0]] : [[ 0.99816596]]
[[1 1]] : [[ 0.0021052]]

You can see that this agrees with the expected output of the XOR gate more or less. One last thing I can recommend is that 10000 iterations is far too long computationally given your current code structure. I noticed that with the above corrections, we are able to reach the expected output in fewer iterations. I've decreased the iterations to 1000 and I've bumped the learning rate alpha up to 0.75. Changing these two things we now get:

[[0 0]] : [[ 0.03029435]]
[[0 1]] : [[ 0.95397528]]
[[1 0]] : [[ 0.95371525]]
[[1 1]] : [[ 0.04796917]]

answered Sep 27 '22 19:09

rayryeng

Related questions
                            
                                ImportError: No module named theano
                            
                                Pygtk color for drag_highlight
                            
                                How to exclude a single file from package with setuptools and setup.py
                            
                                How to save web page as text file [Python]
                            
                                Qt Property Browser Framework or similar in python
                            
                                Spark Python error "FileNotFoundError: [WinError 2] The system cannot find the file specified"
                            
                                How to visualize (dendrogram) a dictionary of hierarchical items?
                            
                                Django-activity-stream : Apps aren't loaded yet
                            
                                bokeh multiple figures with shared legend
                            
                                Python: List containing sublist of strings
                            
                                Resize a batch of images in numpy
                            
                                shuffling a list with restrictions in Python
                            
                                Python: How do I find which pip package a library belongs to?
                            
                                Python: Get source code of class (using inspect)
                            
                                Insert file records into postgres db using clojure jdbc is taking long time compared to python psycopg2
                            
                                Creating separate database connection for every celery worker
                            
                                2^n Itertools combinations with advanced filtering
                            
                                How can I fill arbitrary closed regions in Matplotlib?
                            
                                Schedule reminder for recurring event
                            
                                Read Amibroker price volume data using python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

XOR Neural Network Converges to 0.5

Tags:

python

machine-learning

neural-network

backpropagation

sam

People also ask

1 Answers

rayryeng

Recent Activity

Donate For Us