I am asking this question because I am successfully training a segmentation network on my GTX 2070 on laptop with 8GB VRAM and I use exactly the same code and exactly the same software libraries installed on my desktop PC with a GTX 1080TI and it still throws out of memory.
Why does this happen, considering that:
The same Windows 10 + CUDA 10.1 + CUDNN 7.6.5.32 + Nvidia Driver 418.96 (comes along with CUDA 10.1) are both on laptop and on PC.
The fact that training with TensorFlow 2.3 runs smoothly on the GPU on my PC, yet it fails allocating memory for training only with PyTorch.
PyTorch recognises the GPU (prints GTX 1080 TI) via the command : print(torch.cuda.get_device_name(0))
PyTorch allocates memory when running this command: torch.rand(20000, 20000).cuda()
#allocated 1.5GB of VRAM.
What is the solution to this?
Most of the people (even in the thread below) jump to suggest that decreasing the batch_size will solve this problem. In fact, it does not in this case. For example, it would have been illogical for a network to train on 8GB VRAM and yet to fail to train on 11GB VRAM, considering that there were no other applications consuming video memory on the system with 11GB VRAM and the exact same configuration is installed and used.
The reason why this happened in my case was that, when using the DataLoader
object, I set a very high (12) value for the workers
parameter. Decreasing this value to 4 in my case solved the problem.
In fact, although at the bottom of the thread, the answer provided by Yurasyk at https://github.com/pytorch/pytorch/issues/16417#issuecomment-599137646 pointed me in the right direction.
Solution: Decrease the number of workers
in the PyTorch DataLoader
. Although I do not exactly understand why this solution works, I assume it is related to the threads spawned behind the scenes for data fetching; it may be the case that, on some processors, such an error appears.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With