Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a perceptron with backpropagation algorithm

I am trying to implement a two-layer perceptron with backpropagation to solve the parity problem. The network has 4 binary inputs, 4 hidden units in the first layer and 1 output in the second layer. I am using this for reference, but am having problems with convergence.

First, I will note that I am using a sigmoid function for activation, and so the derivative is (from what I understand) the sigmoid(v) * (1 - sigmoid(v)). So, that is used when calculating the delta value.

So, basically I set up the network and run for just a few epochs (go through each possible pattern -- in this case, 16 patterns of input). After the first epoch, the weights are changed slightly. After the second, the weights do not change and remain so no matter how many more epochs I run. I am using a learning rate of 0.1 and a bias of +1 for now.

The process of training the network is below in pseudocode (which I believe to be correct according to sources I've checked):

Feed Forward Step:

v = SUM[weight connecting input to hidden * input value] + bias  
y = Sigmoid(v)  
set hidden.values to y  
v = SUM[weight connecting hidden to output * hidden value] + bias  
y = Sigmoid(v)  
set output value to y

Backpropagation of Output Layer:

error = desired - output.value  
outputDelta = error * output.value * (1 - output.value)

Backpropagation of Hidden Layer:

for each hidden neuron h:  
error = outputDelta * weight connecting h to output  
hiddenDelta[i] = error * h.value * (1 - h.value)

Update Weights:

for each hidden neuron h connected to the output layer  
h.weight connecting h to output = learningRate * outputDelta * h.value

for each input neuron x connected to the hidden layer  
x.weight connecting x to h[i] = learningRate * hiddenDelta[i] * x.value

This process is of course looped through the epochs and the weight changes persist. So, my question is, are there any reasons that the weights remain constant after the second epoch? If necessary I can post my code, but at the moment I am hoping for something obvious that I'm overlooking. Thanks all!

EDIT: Here are the links to my code as suggested by sarnold:
MLP.java: http://codetidy.com/1903
Neuron.java: http://codetidy.com/1904
Pattern.java: http://codetidy.com/1905
input.txt: http://codetidy.com/1906

like image 705
Aaron Avatar asked Oct 09 '22 18:10

Aaron


1 Answers

I think I spotted the problem; funny enough, what I found is visible in your high-level description, but I only found what looked odd in the code. First, the description:

for each hidden neuron h connected to the output layer
h.weight connecting h to output = learningRate * outputDelta * h.value

for each input neuron x connected to the hidden layer
x.weight connecting x to h[i] = learningRate * hiddenDelta[i] * x.value

I believe the h.weight should be updated with respect to the previous weight. Your update mechanism sets it based only on the learning rate, the output delta, and the value of the node. Similarly, the x.weight is also being set based on the learning rate, the hidden delta, and the value of the node:

    /*** Weight updates ***/

    // update weights connecting hidden neurons to output layer
    for (i = 0; i < output.size(); i++) {
        for (Neuron h : output.get(i).left) {
            h.weights[i] = learningRate * outputDelta[i] * h.value;
        }
    }

    // update weights connecting input neurons to hidden layer
    for (i = 0; i < hidden.size(); i++) {
        for (Neuron x : hidden.get(i).left) {
            x.weights[i] = learningRate * hiddenDelta[i] * x.value;
        }
    }

I do not know what the correct solution is; but I have two suggestions:

  1. Replace these lines:

            h.weights[i] = learningRate * outputDelta[i] * h.value;
            x.weights[i] = learningRate * hiddenDelta[i] * x.value;
    

    with these lines:

            h.weights[i] += learningRate * outputDelta[i] * h.value;
            x.weights[i] += learningRate * hiddenDelta[i] * x.value;
    

    (+= instead of =.)

  2. Replace these lines:

            h.weights[i] = learningRate * outputDelta[i] * h.value;
            x.weights[i] = learningRate * hiddenDelta[i] * x.value;
    

    with these lines:

            h.weights[i] *= learningRate * outputDelta[i];
            x.weights[i] *= learningRate * hiddenDelta[i];
    

    (Ignore the value and simply scale the existing weight. The learning rate should be 1.05 instead of .05 for this change.)

like image 108
sarnold Avatar answered Oct 12 '22 11:10

sarnold