Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network Backpropagation implementation issues

I've been reading up quite a bit on Neural Networks and training them with backprogpagation, primarily this Coursera course, with additional reading from here and here. I thought I had a pretty soild grasp of the core algorithm, but my attempt to build a backpropagation trained neural net hasn't quite worked out and i'm not sure why.

The code is in C++ with no vectorisation as of yet.

I wanted to build a simple 2 input neurons, 1 hidden neuron, 1 output neuron, network to model the AND function. Just to understand how the concepts worked before moving onto a more complex example My forward propagation code worked when I hand coded in the values for the weights and biases.

float NeuralNetwork::ForwardPropagte(const float *dataInput)
{
        int number = 0; // Write the input data into the input layer
        for ( auto & node : m_Network[0])
        {
            node->input = dataInput[number++];
        }

        // For each layer in the network
        for ( auto & layer : m_Network)
        {
            // For each neuron in the layer
            for (auto & neuron : layer)
            {
                float activation;
                if (layerIndex != 0)
                {
                   neuron->input += neuron->bias;
                   activation = Sigmoid( neuron->input);
                } else {
                    activation = neuron->input;
                }

                for (auto & pair : neuron->outputNeuron)
                {
                    pair.first->input += static_cast<float>(pair.second)*activation;
                }
            }
        }

        return Sigmoid(m_Network[m_Network.size()-1][0]->input);
}

Some of these variables are fairly poorly named but basically, neuron->outputNeuron is a vector of pairs. The first being a pointer to the next neuron and the second being the weight value. neuron->input is the "z" value in the neural network equation, the sum of all the wieghts*activation + bais. Sigmoid is given by:

float NeuralNetwork::Sigmoid(float value) const
{
    return 1.0f/(1.0f + exp(-value));
}

These two appear to work as intended. After a pass over the network all the 'z' or 'neuron->input' values are reset to zero (or after backpropagation).

I then train the network following the psudo-code below. Training code is run multiple times.

for trainingExample=0 to m // m = number of training examples
   perform forward propagation to calculate hyp(x)
   calculate cost delta of last layer
         delta = y - hyp(x)
   use the delta of the output to calculate delta for all layers
   move over the network adjusting the weights based on this value
   reset network

The actual code is here:

void NeuralNetwork::TrainNetwork(const std::vector<std::pair<std::pair<float,float>,float>> & trainingData)
{
    for (int i = 0; i < 100; ++i)
    {
        for (auto & trainingSet : trainingData)
        {
            float x[2] = {trainingSet.first.first,trainingSet.first.second};
            float y      = trainingSet.second;
            float estimatedY = ForwardPropagte(x);

            m_Network[m_Network.size()-1][0]->error = estimatedY - y;
            CalculateError();
            RunBackpropagation();
            ResetActivations();
        }
    }
}

With the backpropagation function given by:

void NeuralNetwork::RunBackpropagation()
{
    for (int index = m_Network.size()-1; index >= 0; --index)
    {
        for(auto &node : m_Network[index])
        {
            // Again where the "outputNeuron" is a list of the next layer of neurons and associated weights
            for (auto &weight : node->outputNeuron)
            {
                weight.second += weight.first->error*Sigmoid(node->input);
            }
            node->bias = node->error; // I'm not sure how to adjust the bias, some of the formulas seemed to point to this. Is it correct?
        }
    }
}

and the cost calculated by:

void NeuralNetwork::CalculateError()
{
    for (int index = m_Network.size()-2; index > 0; --index)
    {
        for(auto &node : m_Network[index])
        {
            node->error = 0.0f;

            float sigmoidPrime = Sigmoid(node->input)*(1 - Sigmoid(node->input));

            for (auto &weight : node->outputNeuron)
            {
                node->error += (weight.first->error*weight.second)*sigmoidPrime;
            }
        }
    }   
}

I randomize the weights and run it on the data set:

    x = {0.0f,0.0f} y =0.0f
    x = {1.0f,0.0f} y =0.0f
    x = {0.0f,1.0f} y =0.0f
    x = {1.0f,1.0f} y =1.0f 

Of course I shouldn't be training and testing with the same data set but I just wanted to get the basic backpropagation algortithm up and running. When I run this code I see the weights/biases are as follows:

Layer 0
    Bias 0.111129
    NeuronWeight 0.058659
    Bias -0.037814
    NeuronWeight -0.018420
Layer 1
    Bias 0.016230
    NeuronWeight -0.104935
Layer 2
    Bias 0.080982

The training set runs and the mean squared error of delta[outputLayer] looks somthing like:

Error: 0.156954
Error: 0.152529
Error: 0.213887
Error: 0.305257
Error: 0.359612
Error: 0.373494
Error: 0.374910
Error: 0.374995
Error: 0.375000

... remains at this value for ever...

And the final weights look like: (they always end up at roughtly this value)

Layer 0
    Bias 0.000000
    NeuronWeight 15.385233
    Bias 0.000000
    NeuronWeight 16.492933
Layer 1
    Bias 0.000000
    NeuronWeight 293.518585
Layer 2
    Bias 0.000000

I accept that this may seem like quite a roundabout way of learning neural networks and the implementation is (at the moment) very unoptimal. But can anyone spot any point where I make an invalid assumption, or either the implementation or the formula is wrong?

EDIT

Thanks for the feedback for the bias values, I stopped them being applied to the input layer and stopped passing the input layer through the sigmoid function. Additionaly my Sigmoid prime function was invalid. But the network still isn't working. I've updated the error and output above with what happens now.

like image 256
Davors72 Avatar asked Nov 17 '15 17:11

Davors72


People also ask

What are the main problems with the back-propagation learning algorithm?

Because each expert is only utilized for a few instances of inputs, back-propagation is slow and unreliable. And when new circumstances arise, the Mixture of Experts cannot adapt its parsing quickly. If a circumstance requires a new kind of expertise, existing Mixtures of Experts cannot add that specialization.

What are disadvantages of backpropagation network?

The biggest disadvantages of backpropagation are: Backpropagation could be rather sensitive to noisy data and irregularity. The performance of backpropagation relies very heavily on the training data. Backpropagation needs a very large amount of time for training.

What problem does backpropagation solve when working with neural networks?

Back-propagation is just a way of propagating the total loss back into the neural network to know how much of the loss every node is responsible for, and subsequently updating the weights in such a way that minimizes the loss by giving the nodes with higher error rates lower weights and vice versa.


2 Answers

As lejilot said, you have a lot of biases there. You don't need a bias in the last layer, it's an output layer and a bias must be connected to its input, but not to its output. Take a look at the following image:

In this image you can see that there is just one bias per layer, except for the last one, where there is no need of a bias.

Here you can read a very intuitive approach to neural networks. It is in Python, but it can help you to understand some concepts of neural networks better.

like image 86
Guillem Cucurull Avatar answered Sep 29 '22 04:09

Guillem Cucurull


I solved my issue (beyond the initial biases/signed prime issue above). I started subtracting from, instead of adding to, the weights. In the sources I looked at they had a minus sign inside the delta value calculation which I don't have but I kept their format of adding the negated value to the weights. Additionally, I was puzzled about what to do with the weight and misread one source which said to assign it to the error. I see now the intuition is treating it as a normal weight but multiplying by the bias constant of 1 instead of z. After I added in these changes iterating over the training set ~1000 times could model simple bitwise expressions like OR and AND.

like image 20
Davors72 Avatar answered Sep 29 '22 05:09

Davors72