Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change the temperature of a softmax output in Keras

I am currently trying to reproduce the results of the following article.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I am using Keras with the theano backend. In the article he talks about controlling the temperature of the final softmax layer to give different outputs.

Temperature. We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, etc). In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:

My model is as follows.

model = Sequential()
model.add(LSTM(128, batch_input_shape = (batch_size, 1, 256), stateful = True, return_sequences = True))
model.add(LSTM(128, stateful = True))
model.add(Dropout(0.1))
model.add(Dense(256, activation = 'softmax'))

model.compile(optimizer = Adam(),
              loss = 'categorical_crossentropy', 
              metrics = ['accuracy'])

The only way I can think to adjust the temperature of the final Dense layer would be to get the weight matrix and multiply it by the temperature. Does anyone know of a better way to do it? Also if anyone sees anything wrong with how I setup the model let me know since I am new to RNNs.

like image 766
chasep255 Avatar asked May 16 '16 02:05

chasep255


People also ask

What does temperature do in softmax?

Temperature will modify the output distribution of the mapping. For example: low temperature softmax probs : [0.01,0.01,0.98] high temperature softmax probs : [0.2,0.2,0.6]

What is temperature parameter softmax?

In practice, we often see softmax with temperature, which is a slight modification of softmax: p i = exp ⁡ ( x i / τ ) ∑ j = 1 N exp ⁡ The parameter is called the temperature parameter1, and it is used to control the softness of the probability distribution.

What is the output of a softmax layer?

The output of a Softmax is a vector (say v ) with probabilities of each possible outcome. The probabilities in vector v sums to one for all possible outcomes or classes.


2 Answers

Well it looks like the temperature is something you do to the output of the softmax layer. I found this example.

https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

He applies the following function to sample the soft-max output.

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))
like image 125
chasep255 Avatar answered Sep 28 '22 04:09

chasep255


The answer from @chasep255 works ok but you will get warnings because of log(0). You can simplify the operation e^log(a)/T = a^(1/T) and get rid of the log

def sample(a, temperature=1.0):
  a = np.array(a)**(1/temperature)
  p_sum = a.sum()
  sample_temp = a/p_sum 
  return np.argmax(np.random.multinomial(1, sample_temp, 1))

Hope it helps!

like image 41
Julian Avatar answered Sep 28 '22 04:09

Julian