I am trying to get the output of a neural network which I have already trained. The input is an image of the size 300x300. I am using a batch size of 1, but I still get a CUDA error: out of memory
error after I have successfully got the output for 25 images.
I tried torch.cuda.empty_cache()
, but this still doesn't seem to solve the problem. Code:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
train_x = torch.tensor(train_x, dtype=torch.float32).view(-1, 1, 300, 300)
train_x = train_x.to(device)
dataloader = torch.utils.data.DataLoader(train_x, batch_size=1, shuffle=False)
right = []
for i, left in enumerate(dataloader):
print(i)
temp = model(left).view(-1, 1, 300, 300)
right.append(temp.to('cpu'))
del temp
torch.cuda.empty_cache()
This for loop
runs for 25 times every time before giving the memory error.
Every time, I am sending a new image in the network for computation. So, I don't really need to store the previous computation results in the GPU after every iteration in the loop. Is there any way to achieve this?
PyTorch uses a memory cache to avoid malloc/free calls and tries to reuse the memory, if possible, as described in the docs. To release memory from the cache so that other processes can use it, you could call torch. cuda. empty_cache() .
In my model, it appears that “cuda runtime error(2): out of memory” is occurring due to a GPU memory drain. Because PyTorch typically manages large amounts of data, failure to recognize small errors can cause your program to crash to the ground without all its GPU being available.
Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi .
I figured out where I was going wrong. I am posting the solution as an answer for others who might be struggling with the same problem.
Basically, what PyTorch does is that it creates a computational graph whenever I pass the data through my network and stores the computations on the GPU memory, in case I want to calculate the gradient during backpropagation. But since I only wanted to perform a forward propagation, I simply needed to specify torch.no_grad()
for my model.
Thus, the for loop in my code could be rewritten as:
for i, left in enumerate(dataloader): print(i) with torch.no_grad(): temp = model(left).view(-1, 1, 300, 300) right.append(temp.to('cpu')) del temp torch.cuda.empty_cache()
Specifying no_grad()
to my model tells PyTorch that I don't want to store any previous computations, thus freeing my GPU space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With