Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will 32 threads from 32 block be scheduled as a warp?

Tags:

cuda

I understand that in CUDA, 32 adjacent threads in the same block will be scheduled as a warp. But I frequently finds some tutorial CUDA codes that has multiple blocks with 1 thread per block. In this model, will 32 threads from 32 block be scheduled as a warp? If not, can I say this model is not as efficient as organizing into 32-threads per block? Thanks!

like image 595
Hailiang Zhang Avatar asked Dec 04 '12 02:12

Hailiang Zhang


People also ask

How many threads is a warp?

A warp is a set of 32 threads within a thread block such that all the threads in a warp execute the same instruction. These threads are selected serially by the SM. Once a thread block is launched on a multiprocessor (SM), all of its warps are resident until their execution finishes.

How many warps are in a thread block?

Therefore, blocks are divided into warps of 32 threads for execution.

What happens when threads in the same warp take different control paths threads in different warps?

These passes are sequential to each other and thus increase the execution time. If threads in the same warp follow different paths of control flow, then we say that these threads diverge in their execution.

What is the difference between a block and a thread?

Threads are fundamentally executed in warps of 32 threads. Blocks are composed of 1 or more warps, and grid of 1 or more blocks.


1 Answers

No, threads from different blocks cannot be scheduled in the same warp. If you create grids of threadblocks with only a single thread, you're definitely not getting the full performance from the machine. It's less efficient than having 32 (or an integer multiple of 32) threads per block. A Fermi SM, for example has 32 warp lanes that can be in use. If you are scheduling blocks of a single thread, then only 1 of those 32 lanes can be in use at any given time.

Threads have a thread ID (threadIdx built-in variable) which is defined within (and unique only to) a single block.

The Hardware multithreading section of the C programming guide gives a formula which defines the total number of warps in a single block.

like image 146
Robert Crovella Avatar answered Sep 20 '22 13:09

Robert Crovella