Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running out of memory during evaluation in Pytorch

Tags:

pytorch

I'm training a model in pytorch. Every 10 epochs, I'm evaluating the train and test error on the entire train and test dataset. For some reason the evaluation function is causing out-of-memory on my GPU. This is strange because I have the same batchsize for training and evaluation. I believe it's due to the net.forward() method being called repeated and having all the hidden values stored in memory but I'm not sure how to get around this?

def evaluate(self, data):
    correct = 0
    total = 0
    loader = self.train_loader if data == "train" else self.test_loader
    for step, (story, question, answer) in enumerate(loader):
        story = Variable(story)
        question = Variable(question)
        answer = Variable(answer)
        _, answer = torch.max(answer, 1)

        if self.config.cuda:
            story = story.cuda()
            question = question.cuda()
            answer = answer.cuda()

        pred_prob = self.mem_n2n(story, question)[0]
        _, output_max_index = torch.max(pred_prob, 1)
        toadd = (answer == output_max_index).float().sum().data[0]
        correct = correct + toadd
        total = total + captions.size(0)

    acc = correct / total
    return acc
like image 327
user3768533 Avatar asked Nov 02 '17 23:11

user3768533


2 Answers

I think it fails during Validation because you don't use optimizer.zero_grad(). The zero_grad executes detach, making the tensor a leaf. It is commonly used every epoch in the training part.

The use of volatile flag in Variable from PyTorch 0.4.0 has been removed. Ref - migration_guide_to_0.4.0

Starting from 0.4.0, to avoid the gradient being computed during validation, use torch.no_grad()

Code example from the migration guide.

# evaluate
with torch.no_grad():                   # operations inside don't track history
  for input, target in test_loader:
      ...

For 0.3.X, using volatile should work.

like image 113
MonsieurBeilto Avatar answered Oct 07 '22 00:10

MonsieurBeilto


I would suggest to use volatile flag set to True for all variables used during the evaluation,

    story = Variable(story, volatile=True)
    question = Variable(question, volatile=True)
    answer = Variable(answer, volatile=True)

Thus, the gradients and operation history is not stored and you will save a lot of memory. Also, you could delete references to those variables at the end of the batch processing:

del story, question, answer, pred_prob

Don't forget to set the model to the evaluation mode (and back to the train mode after you finished the evaluation). For instance, like this

model.eval()
like image 33
Egor Lakomkin Avatar answered Oct 07 '22 00:10

Egor Lakomkin