I have a question about the following code in pytorch language modeling:
print("Training and generating...")
for epoch in range(1, config.num_epochs + 1):
total_loss = 0.0
model.train()
hidden = model.init_hidden(config.batch_size)
for ibatch, i in enumerate(range(0, train_len - 1, seq_len)):
data, targets = get_batch(train_data, i, seq_len)
hidden = repackage_hidden(hidden)
model.zero_grad()
output, hidden = model(data, hidden)
loss = criterion(output.view(-1, config.vocab_size), targets)
loss.backward()
Please check line 5.
And the init_hidden function is as follows:
def init_hidden(self, bsz):
weight = next(self.parameters()).data
if self.rnn_type == 'LSTM': # lstm:(h0, c0)
return (Variable(weight.new(self.n_layers, bsz, self.hi_dim).zero_()),
Variable(weight.new(self.n_layers, bsz, self.hi_dim).zero_()))
else: # gru & rnn:h0
return Variable(weight.new(self.n_layers, bsz, self.hi_dim).zero_())
My question is:
Why do we need to init_hidden every epoch? Shouldn't it be that the model inherit the hidden parameters from last epoch and continue training on them.
The PyTorch RNN activation function is defined as how the weighted sum of input is altered into an output from a node or nodes in a layer of the network. Code: In the following code, we will import the torch module from which the activation function of rnn model start working.
To train a recurrent neural network, you use an application of back-propagation called back-propagation through time. The gradient values will exponentially shrink as it propagates through each time step. Again, the gradient is used to make adjustments in the neural networks weights thus allowing it to learn.
The answer lies in init_hidden. It is not the hidden layer weights but the initial hidden state in RNN/LSTM, which is h0 in the formulas. For every epoch, we should re-initialize a new beginner hidden state, this is because during the testing, our model will have no information about the test sentence and will have a zero initial hidden state.
The hidden state stores the internal state of the RNN from predictions made on previous tokens in the current sequence, this allows RNNs to understand context. The hidden state is determined by the output of the previous token.
When you predict for the first token of any sequence, if you were to retain the hidden state from the previous sequence your model would perform as if the new sequence was a continuation of the old sequence which would give worse results. Instead for the first token you initialise an empty hidden state, which will then be filled with the model state and used for the second token.
Think about it this way: if someone asked you to classify a sentence and handed you the US constitution (irrelevant information) vs. if someone gave you some background context about the sentence and then asked you to classify the sentence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With