I am trying to implement softmax at the end of cnn, The output I got is nan and zeros. I am giving high input values to softmax around 10-20k I'm giving an array of X=[2345,3456,6543,-6789,-9234]
My function is
def softmax (X):
B=np.exp(X)
C=np.sum(np.exp(X))
return B/C
I am getting error of true divide and run time error
C:\Anaconda\envs\deep_learning\lib\site-packages\ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
after removing the cwd from sys.path.
The common problem which can occur while applying softmax is the numeric stability problem, which means that the ∑j e^(z_j) may become very large due to the exponential and overflow error that may occur. This overflow error can be solved by subtracting each value of the array with its max value.
exp(z) / np. sum(np. exp(z), axis=1, keepdims=True) reaches the same result as your softmax function.
In short, Softmax Loss is actually just a Softmax Activation plus a Cross-Entropy Loss. Softmax is an activation function that outputs the probability for each class and these probabilities will sum up to one. Cross Entropy loss is just the sum of the negative logarithm of the probabilities.
The softmax function is a function that turns a vector of K real values into a vector of K real values that sum to 1. The input values can be positive, negative, zero, or greater than one, but the softmax transforms them into values between 0 and 1, so that they can be interpreted as probabilities.
According to softmax function, you need to iterate all elements in the array and compute the exponential for each individual element then divide it by the sum of the exponential of the all elements:
import numpy as np
a = [1,3,5]
for i in a:
print np.exp(i)/np.sum(np.exp(a))
0.015876239976466765
0.11731042782619837
0.8668133321973349
However if the numbers are too big the exponents will probably blow up (computer can not handle such big numbers):
a = [2345,3456,6543]
for i in a:
print np.exp(i)/np.sum(np.exp(a))
__main__:2: RuntimeWarning: invalid value encountered in double_scalars
nan
nan
nan
To avoid this, first shift the highest value in array to zero. Then compute the softmax. For example, to compute the softmax of [1, 3, 5]
use [1-5, 3-5, 5-5]
which is [-4, -2, 0]
. Also you may choose the implement it in vectorized way (as you intendet to do in question):
def softmax(x):
f = np.exp(x - np.max(x)) # shift values
return f / f.sum(axis=0)
softmax([1,3,5])
# prints: array([0.01587624, 0.11731043, 0.86681333])
softmax([2345,3456,6543,-6789,-9234])
# prints: array([0., 0., 1., 0., 0.])
For detailed information check out the cs231n course page. The Practical issues: Numeric stability. heading is exactly what I'm trying to explain.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With