I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation. But instead of converging, the it is diverging (the output is becoming infinity) Here are some more details about what I've tried: <ul> <li>I had set the initial weights to 0, but since it was diverging I have randomized the initial weights</li> <li>I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001</li> <li>The two functions I am trying to get it to add are: <code>3 * i + 7 * j+9</code> and <code>j*j + i*i + 24</code> (I am giving the layer <code>i</code> and <code>j</code> as input)</li> <li>I had implemented it as a single layer previously and that could approximate the polynomial functions better</li> <li>I am thinking of implementing momentum in this network but I'm not sure it would help it learn</li> <li>I am using a linear (as in no) activation function</li> <li>There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1</li> </ul> I have checked and rechecked my code but there doesn't seem to be any kind of issue with it. So here's my question: what is going wrong here? Any pointer will be appreciated.

<ol> <li>If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.</li> <li>Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.</li> <li>Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.</li> <li>At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.</li> <li>It really helps to chart your error descent. You will get that "Aha!" factor.</li> </ol>

The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression. another reason could be that there is a problem with the error expression used for calculating the gradients. if these don't hold, then we need to see the code and answer.

Neural Network Diverging instead of converging

2 Answers

If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.
Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.
Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.
At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.
It really helps to chart your error descent. You will get that "Aha!" factor.

answered Oct 13 '22 06:10

Lex

The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.

another reason could be that there is a problem with the error expression used for calculating the gradients.

if these don't hold, then we need to see the code and answer.

answered Oct 13 '22 07:10

sidquanto

Related questions
                            
                                WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2
                            
                                Why in preprocessing image data, we need to do zero-centered data?
                            
                                How can I use a Keras trained model saved in a HDF5 file to make predictions?
                            
                                Where is perplexity calculated in the Huggingface gpt2 language model code?
                            
                                what is the pytorch equivalent of a tensorflow linear regression?
                            
                                Machine learning challenge: learn english pronunciation
                            
                                Looking for libraries which implement sequential minimal optimization in C++
                            
                                Extracting semantic/stylistic features from text
                            
                                Automatic text translation
                            
                                interpreting Naive Bayes results
                            
                                How to apply classifier in Weka's Explorer?
                            
                                Help: Extracting data tuples from text... Regex or Machine learning?
                            
                                k-fold Cross Validation for determining k in k-means?
                            
                                Does Scikit-learn release the python GIL?
                            
                                Interpreting the output of StringToWordVector() - Weka
                            
                                Profiling SVM (e1071) in R
                            
                                Fastest approximate counting algorithm
                            
                                multi layer perceptron - finding the "separating" curve
                            
                                Understanding shannon entropy of a data set
                            
                                Determinig the number of hidden states in a Hidden Markov Model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural Network Diverging instead of converging

Tags:

artificial-intelligence

machine-learning

neural-network

Shayan RC

People also ask

2 Answers

Lex

sidquanto

Recent Activity

Donate For Us