Running out of memory during evaluation in Pytorch

Question

I'm training a model in pytorch. Every 10 epochs, I'm evaluating the train and test error on the entire train and test dataset. For some reason the evaluation function is causing out-of-memory on my GPU. This is strange because I have the same batchsize for training and evaluation. I believe it's due to the net.forward() method being called repeated and having all the hidden values stored in memory but I'm not sure how to get around this?

def evaluate(self, data):
    correct = 0
    total = 0
    loader = self.train_loader if data == "train" else self.test_loader
    for step, (story, question, answer) in enumerate(loader):
        story = Variable(story)
        question = Variable(question)
        answer = Variable(answer)
        _, answer = torch.max(answer, 1)

        if self.config.cuda:
            story = story.cuda()
            question = question.cuda()
            answer = answer.cuda()

        pred_prob = self.mem_n2n(story, question)[0]
        _, output_max_index = torch.max(pred_prob, 1)
        toadd = (answer == output_max_index).float().sum().data[0]
        correct = correct + toadd
        total = total + captions.size(0)

    acc = correct / total
    return acc

MonsieurBeilto · Accepted Answer

I think it fails during Validation because you don't use optimizer.zero_grad(). The zero_grad executes detach, making the tensor a leaf. It is commonly used every epoch in the training part.

The use of volatile flag in Variable from PyTorch 0.4.0 has been removed. Ref - migration_guide_to_0.4.0

Starting from 0.4.0, to avoid the gradient being computed during validation, use torch.no_grad()

Code example from the migration guide.

# evaluate
with torch.no_grad():                   # operations inside don't track history
  for input, target in test_loader:
      ...

For 0.3.X, using volatile should work.

Egor Lakomkin · Answer

I would suggest to use volatile flag set to True for all variables used during the evaluation,

    story = Variable(story, volatile=True)
    question = Variable(question, volatile=True)
    answer = Variable(answer, volatile=True)

Thus, the gradients and operation history is not stored and you will save a lot of memory. Also, you could delete references to those variables at the end of the batch processing:

del story, question, answer, pred_prob

Don't forget to set the model to the evaluation mode (and back to the train mode after you finished the evaluation). For instance, like this

model.eval()

Running out of memory during evaluation in Pytorch

Tags:

pytorch

user3768533

2 Answers

MonsieurBeilto

Egor Lakomkin

Recent Activity

Donate For Us

Running out of memory during evaluation in Pytorch

Tags:

pytorch

user3768533

2 Answers

MonsieurBeilto

Egor Lakomkin

Related questions

Recent Activity

Donate For Us