CUDA: How to check for the right compute capability?

Question

CUDA code compiled with a higher compute capability will execute perfectly for a long time on a device with lower compute capability, before silently failing one day in some kernel. I spent half a day chasing an elusive bug only to realize that the Build Rule had sm_21 while the device (Tesla C2050) was a 2.0.

Is there any CUDA API code I can add which can self-check if it is running on a device with compatible compute capability? I need to compile and work with devices of many compute capabilities. Is there any other action I can take to ensure such errors do not occur?

talonmies · Accepted Answer

In the runtime API, cudaGetDeviceProperties returns two fields major and minor which return the compute capability any given enumerated CUDA device. You can use that to parse the compute capability of any GPU before establishing a context on it to make sure it is the right architecture for what your code does. nvcc can generate a object file containing multiple architectures from a single invocation using the -gencode option, for example:

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13  \
        source.cu

would produce an output object file with an embedded fatbinary object containing cubin files for GT200 and GF100 cards. The runtime API will automagically handle architecture detection and try loading suitable device code from the fatbinary object without any extra host code.

CUDA: How to check for the right compute capability?

Tags:

cuda

Ashwin Nanjappa

1 Answers

talonmies

Recent Activity

Donate For Us

CUDA: How to check for the right compute capability?

Tags:

cuda

Ashwin Nanjappa

1 Answers

talonmies

Related questions

Recent Activity

Donate For Us