I'm training a model in pytorch. Every 10 epochs, I'm evaluating the train and test error on the entire train and test dataset. For some reason the evaluation function is causing out-of-memory on my GPU. This is strange because I have the same batchsize for training and evaluation. I believe it's due to the net.forward() method being called repeated and having all the hidden values stored in memory but I'm not sure how to get around this?
def evaluate(self, data):
correct = 0
total = 0
loader = self.train_loader if data == "train" else self.test_loader
for step, (story, question, answer) in enumerate(loader):
story = Variable(story)
question = Variable(question)
answer = Variable(answer)
_, answer = torch.max(answer, 1)
if self.config.cuda:
story = story.cuda()
question = question.cuda()
answer = answer.cuda()
pred_prob = self.mem_n2n(story, question)[0]
_, output_max_index = torch.max(pred_prob, 1)
toadd = (answer == output_max_index).float().sum().data[0]
correct = correct + toadd
total = total + captions.size(0)
acc = correct / total
return acc
I think it fails during Validation because you don't use optimizer.zero_grad()
. The zero_grad executes detach
, making the tensor a leaf. It is commonly used every epoch in the training part.
The use of volatile flag in Variable from PyTorch 0.4.0 has been removed. Ref - migration_guide_to_0.4.0
Starting from 0.4.0, to avoid the gradient being computed during validation, use torch.no_grad()
Code example from the migration guide.
# evaluate
with torch.no_grad(): # operations inside don't track history
for input, target in test_loader:
...
For 0.3.X, using volatile should work.
I would suggest to use volatile flag set to True for all variables used during the evaluation,
story = Variable(story, volatile=True)
question = Variable(question, volatile=True)
answer = Variable(answer, volatile=True)
Thus, the gradients and operation history is not stored and you will save a lot of memory. Also, you could delete references to those variables at the end of the batch processing:
del story, question, answer, pred_prob
Don't forget to set the model to the evaluation mode (and back to the train mode after you finished the evaluation). For instance, like this
model.eval()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With