XOR neural network error stops decreasing during training

Tags:

I'm training a XOR neural network via back-propagation using stochastic gradient descent. The weights of the neural network are initialized to random values between -0.5 and 0.5. The neural network successfully trains itself around 80% of the time. However sometimes it gets "stuck" while backpropagating. By "stuck", I mean that I start seeing a decreasing rate of error correction. For example, during a successful training, the total error decreases rather quickly as the network learns, like so:

...
...
Total error for this training set: 0.0010008071327708653
Total error for this training set: 0.001000750550254843
Total error for this training set: 0.001000693973929822
Total error for this training set: 0.0010006374037948094
Total error for this training set: 0.0010005808398488103
Total error for this training set: 0.0010005242820908169
Total error for this training set: 0.0010004677305198344
Total error for this training set: 0.0010004111851348654
Total error for this training set: 0.0010003546459349181
Total error for this training set: 0.0010002981129189812
Total error for this training set: 0.0010002415860860656
Total error for this training set: 0.0010001850654351723
Total error for this training set: 0.001000128550965301
Total error for this training set: 0.0010000720426754587
Total error for this training set: 0.0010000155405646494
Total error for this training set: 9.99959044631871E-4

Testing trained XOR neural network
0 XOR 0: 0.023956746649767453
0 XOR 1: 0.9736079194769579
1 XOR 0: 0.9735670067093437
1 XOR 1: 0.045068688874314006

However when it gets stuck, the total errors are decreasing, but it seems to be at a decreasing rate:

...
...
Total error for this training set: 0.12325486644721295
Total error for this training set: 0.12325486642503929
Total error for this training set: 0.12325486640286581
Total error for this training set: 0.12325486638069229
Total error for this training set: 0.12325486635851894
Total error for this training set: 0.12325486633634561
Total error for this training set: 0.1232548663141723
Total error for this training set: 0.12325486629199914
Total error for this training set: 0.12325486626982587
Total error for this training set: 0.1232548662476525
Total error for this training set: 0.12325486622547954
Total error for this training set: 0.12325486620330656
Total error for this training set: 0.12325486618113349
Total error for this training set: 0.12325486615896045
Total error for this training set: 0.12325486613678775
Total error for this training set: 0.12325486611461482
Total error for this training set: 0.1232548660924418
Total error for this training set: 0.12325486607026936
Total error for this training set: 0.12325486604809655
Total error for this training set: 0.12325486602592373
Total error for this training set: 0.12325486600375107
Total error for this training set: 0.12325486598157878
Total error for this training set: 0.12325486595940628
Total error for this training set: 0.1232548659372337
Total error for this training set: 0.12325486591506139
Total error for this training set: 0.12325486589288918
Total error for this training set: 0.12325486587071677
Total error for this training set: 0.12325486584854453

While I was reading up on neural networks I came across a discussion on local minimas and global minimas and how neural networks don't really "know" which minima its supposed to be going towards.

Is my network getting stuck in a local minima instead of a global minima?

231

asked Nov 08 '11 06:11

Vivin Paliath

2 Answers

The paper by Hamey cited in @LiKao's answer proves there are no strict "regional local minima" for XOR in a 2-2-1 neural network. However, it admits "asymptotic minima" wherein the error surface flattens out as one or more weights approach infinity.

In practice, the weights don't even need to be so large for this to happen and it is quite common for a 2-2-1 net to get stuck in this flat asymptotic region. The reason for this is saturation: the gradient of sigmoid activation approaches 0 as the weights get large, so the network is unable to keep learning.

See my notebook experiment - typically around 2 or 3 out of 10 networks end up stuck, even after 10,000 epochs. Results differ slightly if you change the learning rate, batch size, activation or loss functions, initial weights, whether inputs are created randomly or in a fixed order, etc. but usually a network gets stuck now and then.

100

answered Sep 29 '22 13:09

Burrito

Poor gradient descent with excessively large steps as described by LiKao is one possible problem. Another is that there are very flat regions of the XOR error landscape which means that it takes a very long time to converge, and in fact the gradient may be so weak that descent algorithm doesn't pull you in the right direction.

These two papers look at 2-1-1 and 2-2-1 XOR landscapes. One uses a "cross entropy" error function which I don't know. In the first they declare there are no local minima but in the second they say there are local minima at infinity - basically when weights run off to very large values. So for the second case, their results suggest if you don't start off near "enough" true minima you may get trapped at the infinite points. They also say that other analyses of 2-2-1 XOR networks that show no local minima are not contradicted by their results because of particular definitions.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.4770

http://www.ncbi.nlm.nih.gov/pubmed/12662806

answered Sep 29 '22 14:09

William Chen

Related questions
                            
                                What's the difference between Conv layer and Pooling layer in CNN?
                            
                                how to predict new cases using the neuralnet package
                            
                                Test labels for regression caffe, float not allowed?
                            
                                What does tensorflow "op" do?
                            
                                How can I hot one encode in Matlab? [duplicate]
                            
                                Split data directory into training and test directory with sub directory structure preserved
                            
                                Where to start Handwritten Recognition using Neural Network?
                            
                                What are alternatives of Gradient Descent?
                            
                                TypeError: concat() got multiple values for argument 'axis'
                            
                                Matlab: neural network time series prediction?
                            
                                How to implement a generic neural network efficiently in Haskell?
                            
                                Calculate convolutional layer in CNN implementation
                            
                                Implementation of Liquid State Machines
                            
                                Reinforcement learning in C# [closed]
                            
                                Auto-encoders with tied weights in Caffe
                            
                                Neural network in Javascript not learning properly
                            
                                how is backpropagation the same (or not) as reverse automatic differentiation?
                            
                                Miminum requirements for Google tensorflow image classifier
                            
                                Perceptron learning algorithm doesn't work
                            
                                Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

XOR neural network error stops decreasing during training

Tags:

neural-network

backpropagation

Vivin Paliath

People also ask

2 Answers

Burrito

William Chen

Recent Activity

Donate For Us