I have the following problem. I want to allow my users to choose which GPU to run on. So I was testing on my machine which has only one GPU (device 0) what would happen if they choose a device which doesn't exist.
If I do cudaSetDevice(0);
it will work fine.
If I do: cudaSetDevice(1);
it will crash with invalid device ordinal
(I can handle this as the function returns an error).
If I do: cudaSetDevice(0); cudaSetDevice(1);
it will crash with invalid device ordinal
(I can handle this as the function returns an error).
However! If I do: cudaSetDevice(1); cudaSetDevice(0);
the second command returns success but on the first calculation I try to compute on my GPU it will crash with invalid device ordinal
. I cannot handle this because the second command does not return an error!
It seems to me like the first cudaSetDevice leaves something lying around which affects the second command?
Thanks very much!
Solution: (Thanks to Robert Crovella!). I was handling the errors like:
error = cudaSetDevice(1);
if (error) { blabla }
But apparently you need to call cudaGetLastError() after the cudaSetDevice(1) because otherwise the error message is not removed from some error stack and it just crashes later on where I was doing cudaGetLastError() for another function even though there was no error at this point.
You have to check how many GPU's are available in your system first. It's possible by the use of cudaGetDeviceCount
.
int deviceCount = 0;
cudaGetDeviceCount(&deviceCount);
Then check if the user input is greater than the available devices.
if (userDeviceInput < deviceCount)
{
cudaSetDevice(userDeviceInput);
}
else
{
printf("error: invalid device choosen\n");
}
Remind thatcudaSetDevice
is 0-index-based! Therefor I check userDeviceInput < deviceCount
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With