<p>Im personally studying theories of neural network and got some questions. </p> <p>In many books and references, for activation function of hidden layer, hyper-tangent functions were used. </p> <p>Books came up with really simple reason that linear combinations of tanh functions can describe nearly all shape of functions with given error. </p> <p>But, there came a question. </p> <ol> <li>Is this a real reason why tanh function is used?</li> <li>If then, is it the only reason why tanh function is used?</li> <li>if then, is tanh function the only function that can do that?</li> <li>if not, what is the real reason?..</li> </ol> <p>I stock here keep thinking... please help me out of this mental(?...) trap!</p>

<p>Most of time tanh is quickly converge than sigmoid and logistic function, and performs better accuracy [1]. However, recently rectified linear unit (ReLU) is proposed by Hinton [2] which shows ReLU train six times fast than tanh [3] to reach same training error. And you can refer to [4] to see what benefits ReLU provides.</p> <hr> <p>Accordining to about 2 years machine learning experience. I want to share some stratrgies the most paper used and my experience about computer vision.</p> <h3>Normalizing input is very important</h3> <p>Normalizing well could get better performance and converge quickly. Most of time we will subtract mean value to make input mean to be zero to prevent weights change same directions so that converge slowly [5] .Recently google also points that phenomenon as internal covariate shift out when training deep learning, and they proposed batch normalization [6] so as to normalize each vector having zero mean and unit variance.</p> <h3>More data more accuracy</h3> <p>More training data could generize feature space well and prevent overfitting. In computer vision if training data is not enough, most of used skill to increase training dataset is data argumentation and synthesis training data.</p> <h3>Choosing a good activation function allows training better and efficiently.</h3> <p>ReLU nonlinear acitivation worked better and performed state-of-art results in deep learning and MLP. Moreover, it has some benefits e.g. simple to implementation and cheaper computation in back-propagation to efficiently train more deep neural net. However, ReLU will get zero gradient and do not train when the unit is zero active. Hence some modified ReLUs are proposed e.g. Leaky ReLU, and Noise ReLU, and most popular method is PReLU [7] proposed by Microsoft which generalized the traditional recitifed unit.</p> <h3>Others</h3> <ul> <li>choose large initial learning rate if it will not oscillate or diverge so as to find a better global minimum.</li> <li>shuffling data</li> </ul>

Why use tanh for activation function of MLP?

2 Answers

Most of time tanh is quickly converge than sigmoid and logistic function, and performs better accuracy [1]. However, recently rectified linear unit (ReLU) is proposed by Hinton [2] which shows ReLU train six times fast than tanh [3] to reach same training error. And you can refer to [4] to see what benefits ReLU provides.

Accordining to about 2 years machine learning experience. I want to share some stratrgies the most paper used and my experience about computer vision.

Normalizing input is very important

Normalizing well could get better performance and converge quickly. Most of time we will subtract mean value to make input mean to be zero to prevent weights change same directions so that converge slowly [5] .Recently google also points that phenomenon as internal covariate shift out when training deep learning, and they proposed batch normalization [6] so as to normalize each vector having zero mean and unit variance.

More data more accuracy

More training data could generize feature space well and prevent overfitting. In computer vision if training data is not enough, most of used skill to increase training dataset is data argumentation and synthesis training data.

Choosing a good activation function allows training better and efficiently.

ReLU nonlinear acitivation worked better and performed state-of-art results in deep learning and MLP. Moreover, it has some benefits e.g. simple to implementation and cheaper computation in back-propagation to efficiently train more deep neural net. However, ReLU will get zero gradient and do not train when the unit is zero active. Hence some modified ReLUs are proposed e.g. Leaky ReLU, and Noise ReLU, and most popular method is PReLU [7] proposed by Microsoft which generalized the traditional recitifed unit.

Others

choose large initial learning rate if it will not oscillate or diverge so as to find a better global minimum.
shuffling data

133

answered Oct 14 '22 22:10

RyanLiu

In truth both tanh and logistic functions can be used. The idea is that you can map any real number ( [-Inf, Inf] ) to a number between [-1 1] or [0 1] for the tanh and logistic respectively. In this way, it can be shown that a combination of such functions can approximate any non-linear function. Now regarding the preference for the tanh over the logistic function is that the first is symmetric regarding the 0 while the second is not. This makes the second one more prone to saturation of the later layers, making training more difficult.

answered Oct 14 '22 21:10

ASantosRibeiro

Related questions
                            
                                How to Format Highcharts dataLabels Decimal Points
                            
                                bootstrap tags input width
                            
                                detect if views are overlapping
                            
                                How to convert r data frame to h2o object
                            
                                Find min value in array > 0
                            
                                Add image to alert view
                            
                                How does the removeSparseTerms in R work?
                            
                                Why is Hangfire requiring authentication to view dashboard
                            
                                iterating and filtering two lists using java 8
                            
                                How to enable CORS in Grails 3.0.1
                            
                                Android tests build error: Multiple dex files define Landroid/support/test/BuildConfig
                            
                                Get Content-Disposition parameters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why use tanh for activation function of MLP?

Tags:

forsythia

People also ask