Why do sigmoid functions work in Neural Nets?

Tags:

I have just started programming for Neural networks. I am currently working on understanding how a Backpropogation (BP) neural net works. While the algorithm for training in BP nets is quite straightforward, I was unable to find any text on why the algorithm works. More specifically, I am looking for some mathematical reasoning to justify using sigmoid functions in neural nets, and what makes them mimic almost any data distribution thrown at them.

Thanks!

730

asked Jul 26 '12 20:07

Anshul Porwal

1 Answers

The sigmoid function introduces non-linearity in the network. Without a non-linear activation function, the net can only learn functions which are linear combinations of its inputs. The result is called universal approximation theorem or Cybenko theorem, after the gentleman who proved it in 1989. Wikipedia is a good place to start, and it has a link to the original paper (the proof is somewhat involved though). The reason why you would use a sigmoid as opposed to something else is that it is continuous and differentiable, its derivative is very fast to compute (as opposed to the derivative of tanh, which has similar properties) and has a limited range (from 0 to 1, exclusive)

124

answered Sep 22 '22 03:09

mbatchkarov

Related questions
                            
                                Error - urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
                            
                                Empty "using" statement in Dispose
                            
                                Emulator not running
                            
                                Redirect stdout and stderr to Function
                            
                                Using related_name correctly in Django
                            
                                What is the reason for batch file path referenced with %~dp0 sometimes changes on changing directory?
                            
                                Fastest way to compute k largest eigenvalues and corresponding eigenvectors with numpy
                            
                                Split text into pages and present separately (HTML5)
                            
                                HTML5 video: How to test for HLS playing capability? (video.canPlayType)
                            
                                Force python interpreter to reload a code module
                            
                                pylab.ion() in python 2, matplotlib 1.1.1 and updating of the plot while the program runs
                            
                                How to center horizontally the contents of the open file in vim?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With