Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Implementation of softmax function returns nan for high inputs




enter image description here

I am trying to implement softmax at the end of cnn, The output I got is nan and zeros. I am giving high input values to softmax around 10-20k I'm giving an array of X=[2345,3456,6543,-6789,-9234]

My function is

def softmax (X):
    return B/C

I am getting error of true divide and run time error

C:\Anaconda\envs\deep_learning\lib\site-packages\ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
  after removing the cwd from sys.path.
like image 641
Alok Ranjan Swain Avatar asked Feb 26 '19 07:02

Alok Ranjan Swain

People also ask

What is the problem of implementing the softmax function?

The common problem which can occur while applying softmax is the numeric stability problem, which means that the ∑j e^(z_j) may become very large due to the exponential and overflow error that may occur. This overflow error can be solved by subtracting each value of the array with its max value.

How do you implement a softmax function in Python?

exp(z) / np. sum(np. exp(z), axis=1, keepdims=True) reaches the same result as your softmax function.

What is softmax loss function?

In short, Softmax Loss is actually just a Softmax Activation plus a Cross-Entropy Loss. Softmax is an activation function that outputs the probability for each class and these probabilities will sum up to one. Cross Entropy loss is just the sum of the negative logarithm of the probabilities.

What does the softmax function utilizes to converts numbers to probability?

The softmax function is a function that turns a vector of K real values into a vector of K real values that sum to 1. The input values can be positive, negative, zero, or greater than one, but the softmax transforms them into values between 0 and 1, so that they can be interpreted as probabilities.

Video Answer

1 Answers

According to softmax function, you need to iterate all elements in the array and compute the exponential for each individual element then divide it by the sum of the exponential of the all elements:

import numpy as np

a = [1,3,5]
for i in a:
    print np.exp(i)/np.sum(np.exp(a))


However if the numbers are too big the exponents will probably blow up (computer can not handle such big numbers):

a = [2345,3456,6543]
for i in a:
    print np.exp(i)/np.sum(np.exp(a))

__main__:2: RuntimeWarning: invalid value encountered in double_scalars

To avoid this, first shift the highest value in array to zero. Then compute the softmax. For example, to compute the softmax of [1, 3, 5] use [1-5, 3-5, 5-5] which is [-4, -2, 0]. Also you may choose the implement it in vectorized way (as you intendet to do in question):

def softmax(x):
    f = np.exp(x - np.max(x))  # shift values
    return f / f.sum(axis=0)

# prints: array([0.01587624, 0.11731043, 0.86681333])

# prints: array([0., 0., 1., 0., 0.])

For detailed information check out the cs231n course page. The Practical issues: Numeric stability. heading is exactly what I'm trying to explain.

like image 197
Ersel Er Avatar answered Sep 26 '22 01:09

Ersel Er