Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAttern

Overview

So I'm trying to get a grasp on the mechanics of neural networks. I still don't totally grasp the math behind it, but I think I understand how to implement it. I currently have a neural net that can learn AND, OR, and NOR training patterns. However, I can't seem to get it to implement the XOR pattern. My feed forward neural network consists of 2 inputs, 3 hidden, and 1 output. The weights and biases are randomly set between -0.5 and 0.5, and outputs are generated with the sigmoidal activation function

Algorithm

So far, I'm guessing I made a mistake in my training algorithm which is described below:

  1. For each neuron in the output layer, provide an error value that is the desiredOutput - actualOutput --go to step 3
  2. For each neuron in a hidden or input layer (working backwards) provide an error value that is the sum of all forward connection weights * the errorGradient of the neuron at the other end of the connection --go to step 3
  3. For each neuron, using the error value provided, generate an error gradient that equals output * (1-output) * error. --go to step 4
  4. For each neuron, adjust the bias to equal current bias + LEARNING_RATE * errorGradient. Then adjust each backward connection's weight to equal current weight + LEARNING_RATE * output of neuron at other end of connection * this neuron's errorGradient

I'm training my neural net online, so this runs after each training sample.

Code

This is the main code that runs the neural network:

private void simulate(double maximumError) {

    int errorRepeatCount = 0;
    double prevError = 0;

    double error; // summed squares of errors
    int trialCount = 0;

    do {

        error = 0;

        // loop through each training set
        for(int index = 0; index < Parameters.INPUT_TRAINING_SET.length; index++) {

            double[] currentInput = Parameters.INPUT_TRAINING_SET[index];
            double[] expectedOutput = Parameters.OUTPUT_TRAINING_SET[index];
            double[] output = getOutput(currentInput);

            train(expectedOutput);

            // Subtracts the expected and actual outputs, gets the average of those outputs, and then squares it.
            error += Math.pow(getAverage(subtractArray(output, expectedOutput)), 2); 



        }

    } while(error > maximumError);

Now the train() function:

public void train(double[] expected) {

    layers.outputLayer().calculateErrors(expected);

    for(int i = Parameters.NUM_HIDDEN_LAYERS; i >= 0; i--) {
        layers.allLayers[i].calculateErrors();
    }

}

Output layer calculateErrors() function:

public void calculateErrors(double[] expectedOutput) {

    for(int i = 0; i < numNeurons; i++) {

        Neuron neuron = neurons[i];
        double error = expectedOutput[i] - neuron.getOutput();
        neuron.train(error);

    }

}

Normal (Hidden & Input) layer calculateErrors() function:

public void calculateErrors() {

    for(int i = 0; i < neurons.length; i++) {

        Neuron neuron = neurons[i];

        double error = 0;

        for(Connection connection : neuron.forwardConnections) {

            error += connection.output.errorGradient * connection.weight;

        }

        neuron.train(error);

    }

}

Full Neuron class:

package neuralNet.layers.neurons;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

import neuralNet.Parameters;
import neuralNet.layers.NeuronLayer;

public class Neuron {

private double output, bias;
public List<Connection> forwardConnections = new ArrayList<Connection>(); // Forward = layer closer to input -> layer closer to output
public List<Connection> backwardConnections = new ArrayList<Connection>(); // Backward = layer closer to output -> layer closer to input

public double errorGradient;
public Neuron() {

    Random random = new Random();
    bias = random.nextDouble() - 0.5;

}

public void addConnections(NeuronLayer prevLayer) {

    // This is true for input layers. They create their connections differently. (See InputLayer class)
    if(prevLayer == null) return;

    for(Neuron neuron : prevLayer.neurons) {

        Connection.createConnection(neuron, this);

    }

}

public void calcOutput() {

    output = bias;

    for(Connection connection : backwardConnections) {

        connection.input.calcOutput();
        output += connection.input.getOutput() * connection.weight;

    }

    output = sigmoid(output);

}

private double sigmoid(double output) {
    return 1 / (1 + Math.exp(-1*output));
}

public double getOutput() {
    return output;
}

public void train(double error) {

    this.errorGradient = output * (1-output) * error;

    bias += Parameters.LEARNING_RATE * errorGradient;

    for(Connection connection : backwardConnections) {

        // for clarification: connection.input refers to a neuron that outputs to this neuron
        connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;

    }

}

}

Results

When I'm training for AND, OR, or NOR the network can usually converge within about 1000 epochs, however when I train with XOR, the outputs become fixed and it never converges. So, what am I doing wrong? Any ideas?

Edit

Following the advice of others, I started over and implemented my neural network without classes...and it works. I'm still not sure where my problem lies in the above code, but it's in there somewhere.

like image 511
williamg Avatar asked Feb 20 '12 22:02

williamg


2 Answers

This is surprising because you are using a big enough network (barely) to learn XOR. Your algorithm looks right, so I dont really know what is going on. It might help to know how you generate your training data: are you just reating the samples (1,0,1),(1,1,0),(0,1,1),(0,0,0) or something like that over and over? Perhaps the problem is that stochastic gradient descent is causing you to jump around the stabilizing minima. You could try some things to fix this: perhaps randomly sample from your training examples instead of repeating them (if that is what you are doing). Or, alternatively, you could modify your learning algorithm:

currently you have something equivalent to:

weight(epoch) = weight(epoch - 1) + deltaWeight(epoch)
deltaWeight(epoch) = mu * errorGradient(epoch)

where mu is the learning rate

One option is to very slowly decrease the value of mu.

An alternative would be to change your definition of deltaWeight to include a "momentum"

deltaWeight(epoch) = mu * errorGradient(epoch) + alpha * deltaWeight(epoch -1)

where alpha is the momentum parameter (between 0 and 1).

Visually, you can think of gradient descent as trying to find the minimum point of a curved surface by placing an object on that surface, and then step by step moving that object small amounts in which ever directing is sloping down based on where it is currently located. The problem is that you dont really do gradient descent: instead you do stochastic gradient descent where you move in direction by sampling from a set of training vectors and moving in what ever direction the sample makes look like is down. On average over the entire training data, stochastic gradient descent should work, but it is isn't guaranteed to because you can get into a situation where you jump back and forth never making progress. Slowly decreasing the learning rate means you take smaller and smaller steps each time so can not get stuck in an infinite cycle.

On the other hand, momentum makes the algorithm into something akin to rolling a rubber ball. As the ball roles it tends to go in the downwards direction, but it also tends to keep going in the direction it was going before, and if it is ever on a stretch where the down slope is in the same direction for a while it will speed up. The ball will therefore jump over some local minima, and it will be more resilient against stepping back and forth over the target because doing so means working against the force of momentum.


Having some code and thinking about this some more, it is pretty clear that your problem is in training the early layers. The functions you have successfully learned are all linearly separable, so it would make sense that only a single layer is being properly learned. I agree with LiKao about implementation strategies in general, although your approach should work. My suggestion for how to debug this is figure out what the progression of the weights on the connections between the input layer and the output layer looks like.

You should post the rest implementation of Neuron.

like image 56
Philip JF Avatar answered Nov 14 '22 03:11

Philip JF


I faced the same problem short time ago. Finally I found the solution, how to write a code solving XOR wit the MLP algorithm.

The XOR problem seems to be an easy to learn problem but it isn't for the MLP because it's not linearly separable. So even if your MLP is OK (I mean there is no bug in your code) you have to find the good parameters to be able to learn the XOR problem.

Two hidden and one output neuron is fine. The 2 main thing you have to set:

  • although you have only 4 training samples you have to run the training for a couple of thousands epoch.
  • if you use sigmoid hidden layers but linear output the network will converge faster

Here is the detailed description and sample code: http://freeconnection.blogspot.hu/2012/09/solving-xor-with-mlp.html

like image 28
mrbo Avatar answered Nov 14 '22 05:11

mrbo