I am writing my master thesis about how to apply LSTM neural network in time series. In my experiment, i found out that scaling data can have a great impact on the result. For example, when i use a tanh activation function, and the value range is between -1 and 1, the model seems to converge faster and the validation error also does not jump dramatically after each epoch.
Does anyone know is there any mathmetical explanation for that? Or is there any papers already explain about this situation?
Your question reminds me of a picture used in our class, but you can find a similar one from here at 3:02.
In the picture above you can see obviously that the path on the left is much longer than that on the right. The scaling is applied to the left to become the right one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With