Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

torch.manual_seed(seed) get RuntimeError: CUDA error: device-side assert triggered

I am using GOOGLE COLAB when I get this error. Here is my code, I didn't find anything wrong, these code were right few hour ago but suddenly went wrong, I don't know why

import torch
if torch.cuda.is_available():       
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
seed=1
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True 

The error is

There are 1 GPU(s) available.
We will use the GPU: Tesla P100-PCIE-16GB
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-121-436d9d8bb120> in <module>()
      9 seed=1
     10 np.random.seed(seed)
---> 11 torch.manual_seed(seed)
     12 torch.cuda.manual_seed_all(seed)
     13 torch.backends.cudnn.deterministic = True

3 frames
/usr/local/lib/python3.7/dist-packages/torch/cuda/random.py in cb()
    109         for i in range(device_count()):
    110             default_generator = torch.cuda.default_generators[i]
--> 111             default_generator.manual_seed(seed)
    112 
    113     _lazy_call(cb, seed_all=True)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Could anyone help me?

like image 496
Haorui He Avatar asked Sep 21 '25 00:09

Haorui He


1 Answers

In my experience, this error may occur because of some kind of inconsistency between the number of labels in your targets and the number of classes in your model.

To solve it you can try to:

  1. Make sure that the label in your target data starts from 0. If you have n classes in your data, your target classes should be [0, 1, 2,..., n-1]
  2. Make sure that the model you are using is set to work with n classes
like image 147
Carlo Longhi Avatar answered Sep 22 '25 14:09

Carlo Longhi