What will happen if I multiply a constant to the loss function? I think I will get a larger gradient, right? Is it equal to having a larger learning rate?
Basically - it depends on many things:
If you use a classic stochastic / batch / full batch learning with an update rule, where:
new_weights = old_weights - learning_rate * gradient
then due to multiplication commutativity - your claim is true.
If you are using any learning method which has an adaptive learning rate (like ADAM
or rmsprop
)- then things change a little bit. Then still - your gradients would be affected by multiplication - but a learning rate could not be affected at all. It depends on how new value of a cost function will cooperate with learning algorithm.
If you use a learning method in which you have an adaptive gradient but not adaptive learning rate - usually learning rate is affected in a same way like in point 1. (e.g. in momentum
methods).
Yes, you are right. It is equivalent to changing the learning rate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With