I know how to make softmax stable by adding to element -max _i x_i. This avoids overflow and underflow. Now, taking log of this can cause underflow. log softmax(x) can evaluate to zero, leading to -infinity.
I am not sure how to fix it. I know this is a common problem. I read several answers on it, which I didn't understand. But I am still confused on how to solve this problem.
PS: If you provide a simple example, it would be awesome.
In order to stabilize Logsoftmax
, most implementations such as Tensorflow and Thenao, use a trick which takes out the largest component max(x_i)
. This trick is often used for stably computing softmax
. For logsoftmax
, we begin with:
After extracting out the exp(b)
and using the fact that log(exp(x)) = x
, we have:
If we set , this new equation has both overflow and underflow stability conditions.
In terms of code, if x
is a vector:
def log_softmax(x):
x_off = x - np.max(x)
return x_off - np.log(np.sum(np.exp(x_off)))
See also: https://timvieira.github.io/blog/post/2014/02/11/exp-normalize-trick/
logsoftmax = logits - log(reduce_sum(exp(logits), dim))
refer: https://www.tensorflow.org/api_docs/python/tf/nn/log_softmax
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With