What does compute capability 2.0 add over 1.3, 2.1 over 2.0, and 3.0 over 2.1?
The Compute Capability describes the features supported by a CUDA hardware. First CUDA capable hardware like the GeForce 8800 GTX have a compute capability (CC) of 1.0 and recent GeForce like the GTX 480 have a CC of 2.0. Knowing the CC can be useful for understanting why a CUDA based demo can't start on your system.
The CUDA is platform for parallel computing using special GPU (graphics processing unit) by NVIDIA. This platform allows software developers to highly parallel algorithms on graphic units (there are only 2-8 units (kernels) on geneal CPU, but on GPU there are about 400-800 units but much more weaker).
To execute any CUDA program, there are three main steps: Copy the input data from host memory to device memory, also known as host-to-device transfer. Load the GPU program and execute, caching data on-chip for performance. Copy the results from device memory to host memory, also called device-to-host transfer.
CUDA-capable GPUs have hundreds of cores that can collectively run thousands of computing threads. These cores have shared resources including a register file and a shared memory. The on-chip shared memory allows parallel tasks running on these cores to share data without sending it over the system memory bus.
The Compute Capabilities designate different architectures. In general, newer architectures run both CUDA programs and graphics faster than previous architectures. Note, though, that a high end card in a previous generation may be faster than a lower end card in the generation after.
From the CUDA C Programming Guide (v6.0):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With