I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.
But instead of converging, the it is diverging (the output is becoming infinity)
Here are some more details about what I've tried:
3 * i + 7 * j+9
and j*j + i*i + 24
(I am giving the layer i
and j
as input)I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.
So here's my question: what is going wrong here?
Any pointer will be appreciated.
The most obvious reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression. Another cause of your problem could be that there is a problem with the error expression used for calculating the gradients.
The amount of the training data is low or the data we are pushing on the model is corrupted or not collected with the data integrity. The activation function we are using with the network often leads to good results from the model but if complexity is higher then the model can fail to converge.
Divergence allows one neuron to communicate with many other neurons in a network. Convergence allows a neuron to receive input from many neurons in a network.
Input normalization This method is also one of the most helpful methods to make neural networks converge faster. In many of the learning processes, we experience faster training when the training data sum to zero. We can normalize the input data by subtracting the mean value from each input variable.
If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.
Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.
Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.
At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.
It really helps to chart your error descent. You will get that "Aha!" factor.
The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.
another reason could be that there is a problem with the error expression used for calculating the gradients.
if these don't hold, then we need to see the code and answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With