Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA error: device-side assert triggered on Colab

I am trying to initialize a tensor on Google Colab with GPU enabled.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

t = torch.tensor([1,2], device=device)

But I am getting this strange error.

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Even by setting that environment variable to 1 seems not showing any further details.
Anyone ever had this issue?

like image 834
3nomis Avatar asked Jun 28 '21 16:06

3nomis


3 Answers

While I tried your code, and it did not give me an error, I can say that usually the best practice to debug CUDA Runtime Errors: device-side assert like yours is to turn collab to CPU and recreate the error. It will give you a more useful traceback error.

Most of the time CUDA Runtime Errors can be the cause of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. And the thing with this CUDA error is once you get this error once, you will recieve it for every operation you do with torch.tensors. This forces you to restart your notebook.

I suggest you restart your notebook, get a more accuracate traceback by moving to CPU, and check the rest of your code especially if you train a model on set of targets somewhere.

like image 141
SarthakJain Avatar answered Nov 18 '22 05:11

SarthakJain


As the other respondents indicated: Running it on CPU reveals the error. My target labels where {1,2} I changed them to {0,1}. This procedure solved it for me.

like image 2
tschomacker Avatar answered Nov 18 '22 04:11

tschomacker


Double-check the number of gpu. Normally, it should be gpu=0 unless you have more than one gpu.

like image 1
Hoyeol Kim Avatar answered Nov 18 '22 03:11

Hoyeol Kim