Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA threads, SMX, SP and blocks, how do they work?

Tags:

cuda

gpgpu

nvidia

I'm a bit confused on how CUDA works, do threads execute each one the same instruction (SIMT) but using single data accessed with different indices? Or is it considered "different data" (so it's SIMD too)?

Is the SMX the entire GPU chip? A SMX should consist of several SP each one executing one thread at a time, is a block of threads assigned to just one SP?

I'm a bit confused right now

like image 790
Johnny Pauling Avatar asked Sep 08 '12 13:09

Johnny Pauling


1 Answers

A grid launch is 1-3 dimensional launch of thread blocks. A thread block is a 1-3 dimensional group of threads. The CUDA work distributer distributes thread blocks to SMX units. A low end device may have 1 SMX unit. A high end device may have > 10 SMX units.

The SMX unit breaks thread blocks in groups of 32 threads called warps. The SMX unit can have at most 64 warps or 16 blocks allocated at time. Due to resource limitations (blocks, warps, registers per thread, shared memory per block, or barriers) the number may be less.

Each SMX unit has 4 warp schedulers each responsible for a subset of the warps. On each cycle the warp scheduler will select an eligible warp and issue 1 or 2 instructions. In order to dual issue the two instructions have to be independent and use different execution units. For example one instruction can be dispatched to a floating point unit and the second to the load store unit.

In addition to dual issuing a warp scheduler can issue back to back independent instructions to a warp. When a dependency is detected, or the execution unit of the next instruction is busy, or the warp does not have an instruction (waiting on fetch) then the warp scheduler will pick a different warp if one is eligible.

Each thread has its own set of general purpose registers, condition codes, predicates codes and local memory. Each thread is a member of a thread block. All threads can access the thread block resources which include shared memory and barriers. All threads in a grid launch can access grid resources which include constant memory, texture bindings, and surface bindings. All threads can access global memory.

like image 127
Greg Smith Avatar answered Sep 22 '22 19:09

Greg Smith