Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In CUDA, do non-coalesced memory accesses cause branch divergence?

Tags:

branch

cuda

I always thought that branch divergence is only caused by the branching code, like "if", "else", "for", "switch", etc. However I have read a paper recently in which it says:

" One can clearly observe that the number of divergent branches taken by threads in each first exploration-based algorithm is at least twice more important than the full exploration strategy. This is typically the results from additional non-coalesced accesses to the global memory. Hence, such a threads divergence leads to many memory accesses that have to be serialized, increasing the total number of instructions executed.

One can observe that the number of warp serializations for the version using non-coalesced accesses is between seven and sixteen times more important than for its counterpart. Indeed, a threads divergence caused by non-coalesced accesses leads to many memory accesses that have to be serialized, increasing the instructions to be executed. "

It seems like, according to the author, non-coalesced accesses can cause divergent branches. Is that true? My question is, how many reasons exactly are there for the branch divergence? Thanks in advance.

like image 406
Ben Avatar asked Sep 30 '13 09:09

Ben


People also ask

How does GPU handle divergent branch?

GPUs form logical groups of parallel threads belonging to the same instruction pack, named warps (or wavefront in AMD terminology) and schedule a number of them for interleaved execution on an SIMT core. This can lead to higher memory performance and reduce the problem of branch divergence.

What is thread divergence in Cuda?

Warp divergence occurs when two threads of the same warp diverge in their execution due to a branch instruction, where one thread branches and the other does not. This leads to serialization of the two threads by the CUDA hardware until their execution path converges again.

What technique does the GPU use if the execution of threads within a warp diverges?

NVIDIA GPUs execute warps of 32 parallel threads using SIMT, which enables each thread to access its own registers, to load and store from divergent addresses, and to follow divergent control flow paths.


1 Answers

I think the author is unclear on the concepts and/or terminology.

The two concepts of divergence and serialization are closely related. Divergence causes serialization, as the divergent groups of threads in a warp must be executed serially. But serialization does not cause divergence, as divergence refers specifically to threads within a warp running different code paths.

Other things that cause serialization (but not divergence) are bank conflicts and atomic operations.

like image 187
Roger Dahl Avatar answered Oct 20 '22 02:10

Roger Dahl