Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A single thread on CUDA

Tags:

cuda

I am invoking a CUDA kernel with only one block and only one thread inside this block, e.g.

kernel<<<1, 1>>>

Will this kernel be executed only on a single CUDA core as specified? So for instance if the GPU has 128 cores, only 1 of the 128 will be working?

thanks a lot!

like image 235
kostaspap Avatar asked Dec 20 '25 05:12

kostaspap


2 Answers

No. CUDA is an SIMD style architecture and the basic execution unit is a warp -- a grouping of 32 threads which are executed lock step wise on the hardware. If you launch a single block containing a single thread, the hardware will be executing a single warp of 32 threads, 31 of which are masked out and execute the equivalent of a stream of noops. Any given warp is executed on a single streaming multiprocessor, and depending on the generation of hardware you are using, that might involve 8, 16 or 32 cores of the SM on which it runs.

like image 173
talonmies Avatar answered Dec 23 '25 04:12

talonmies


Each CUDA core is a lane in SM's SIMD. Your kernel activates only one SM and utilizes one of the lanes. So the kernel<<<1,1>>> is very inefficient, utilizing only one lane of one SM.

like image 33
lashgar Avatar answered Dec 23 '25 04:12

lashgar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!