I am currently trying to reproduce the results of the following article.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I am using Keras with the theano backend. In the article he talks about controlling the temperature of the final softmax layer to give different outputs.
Temperature. We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, etc). In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:
My model is as follows.
model = Sequential()
model.add(LSTM(128, batch_input_shape = (batch_size, 1, 256), stateful = True, return_sequences = True))
model.add(LSTM(128, stateful = True))
model.add(Dropout(0.1))
model.add(Dense(256, activation = 'softmax'))
model.compile(optimizer = Adam(),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
The only way I can think to adjust the temperature of the final Dense layer would be to get the weight matrix and multiply it by the temperature. Does anyone know of a better way to do it? Also if anyone sees anything wrong with how I setup the model let me know since I am new to RNNs.
Temperature will modify the output distribution of the mapping. For example: low temperature softmax probs : [0.01,0.01,0.98] high temperature softmax probs : [0.2,0.2,0.6]
In practice, we often see softmax with temperature, which is a slight modification of softmax: p i = exp ( x i / τ ) ∑ j = 1 N exp The parameter is called the temperature parameter1, and it is used to control the softness of the probability distribution.
The output of a Softmax is a vector (say v ) with probabilities of each possible outcome. The probabilities in vector v sums to one for all possible outcomes or classes.
Well it looks like the temperature is something you do to the output of the softmax layer. I found this example.
https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py
He applies the following function to sample the soft-max output.
def sample(a, temperature=1.0):
# helper function to sample an index from a probability array
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
return np.argmax(np.random.multinomial(1, a, 1))
The answer from @chasep255 works ok but you will get warnings because of log(0). You can simplify the operation e^log(a)/T = a^(1/T) and get rid of the log
def sample(a, temperature=1.0):
a = np.array(a)**(1/temperature)
p_sum = a.sum()
sample_temp = a/p_sum
return np.argmax(np.random.multinomial(1, sample_temp, 1))
Hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With