Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?

Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA.

I want to determine what's the compute capability of my GPU. If I could compile code, that would be easy:

#include <stdio.h>
int main() {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, 0);
    printf("%d", prop.major * 10 + prop.minor);
}

but - suppose I want to do that without compiling. Can I? I thought nvidia-smi might help me, since its lets you query all sorts of information about devices, but it seems it doesn't let you obtain the compute capability. Maybe there's something else I can do? Maybe something visible via /proc or system logs?

Edit: This is intended to run before a build, on a system which I don't control. So it must have minimal dependencies, run on a command-line and not require root privileges.

like image 257
einpoklum Avatar asked Nov 19 '16 16:11

einpoklum


People also ask

What is compute capability NVIDIA?

The compute capability identifies the features supported by the GPU hardware. It is used by applications at run time to determine which hardware features, instructions are available on the GPU device.

How do you verify the system has a CUDA capable GPU?

2.1. You can verify that you have a CUDA-capable GPU through the Display Adapters section in the Windows Device Manager. Here you will find the vendor name and model of your graphics card(s). If you have an NVIDIA card that is listed in http://developer.nvidia.com/cuda-gpus, that GPU is CUDA-capable.


2 Answers

Unfortunately, it looks like the answer at the moment is "No", and that one needs to either compile a program or use a binary compiled elsewhere.

Edit: I have adapted a workaround for this issue - a self-contained bash script which compiles a small built-in C program to determine the compute capability. (It is particualrly useful to call from with CMake, but can just run independently.)

Also, I've filed a feature-requesting bug report at nVIDIA about this.

Here's the script, in a version assuming that nvcc is on your path:

//usr/bin/env nvcc --run "$0" ${1:+--run-args "${@:1}"} ; exit $?
#include <cstdio>
#include <cstdlib>
#include <cuda_runtime_api.h>

int main(int argc, char *argv[])
{
    cudaDeviceProp prop;
    cudaError_t status;
    int device_count;
    int device_index = 0;
    if (argc > 1) {
        device_index = atoi(argv[1]);
    }

    status = cudaGetDeviceCount(&device_count);
    if (status != cudaSuccess) {
        fprintf(stderr,"cudaGetDeviceCount() failed: %s\n", cudaGetErrorString(status));
        return -1;
    }
    if (device_index >= device_count) {
        fprintf(stderr, "Specified device index %d exceeds the maximum (the device count on this system is %d)\n", device_index, device_count);
        return -1;
    }
    status = cudaGetDeviceProperties(&prop, device_index);
    if (status != cudaSuccess) {
        fprintf(stderr,"cudaGetDeviceProperties() for device device_index failed: %s\n", cudaGetErrorString(status));
        return -1;
    }
    int v = prop.major * 10 + prop.minor;
    printf("%d\n", v);
}
like image 68
einpoklum Avatar answered Nov 11 '22 23:11

einpoklum


You can use deviceQuery utility included in cuda installation

# change cwd into utility source directoy
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# build deviceQuery utility with make as root
$ sudo make

# run deviceQuery
$ ./deviceQuery  | grep Capability
  CUDA Capability Major/Minor version number:    7.5

# optionally copy deviceQuery in ~/bin for future use
$ cp ./deviceQuery ~/bin

Full ouput from deviceQuery with RTX2080Ti is follows:

 $ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          11.2 / 10.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 11016 MBytes (11551440896 bytes)
  (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
  GPU Max Clock rate:                            1770 MHz (1.77 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 5767168 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Thanks.

like image 31
Hongsoog Avatar answered Nov 11 '22 23:11

Hongsoog