I have a next step prediction model on times series which is simply a GRU with a fully-connected layer on top of it. When I train it using CPU after 50 epochs I get a loss of 0.10 but when I train it with GPU the loss is 0.15 after 50 epochs. Doing more epochs doesnt really lower the losses in either cases.
Why is performance after training on CPU better than GPU?
I have tried changing the random seeds for both data and model, and these results are independent of the random seeds.
I have:
Python 3.6.2
PyTorch 0.3.0
CUDNN_MAJOR 7
CUDNN_MINOR 0
CUDNN_PATCHLEVEL 5
Edit:
I also use PyTorch's weight normalizaton torch.nn.utils.weight_norm
on the GRU and on the fully-connected layer.
PyTorch provides a simple to use API to transfer the tensor generated on CPU to GPU. Luckily the new tensors are generated on the same device as the parent tensor.
By default, within PyTorch, you cannot use cross-GPU operations. The exception is the use of copy_() or copy-like methods, such as to() and cuda(). To launch operations across distributed tensors, you must first enable peer-to-peer memory access.
The Power Of Pytorch Because PyTorch is a powerful open-source platform for deep learning, the code can run on both CPUs and GPUs.
After trying many things I think I found the problem. Apparently the CUDNN libraries are sub-optimal in PyTorch. I don't know if it is a bug in PyTorch or a bug in CUDNN but doing
torch.backends.cudnn.enabled = False
solves the problem. With the above line, training with GPU or CPU gives the same loss at the same epoch.
Edit:
It seems that it is the interaction of weight normalization and CUDNN which results in things going wrong. If I remove weight normalization it works. If I remove CUDNN it works. It seems that only in combination they do not work in PyTorch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With