Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA __threadfence()

Tags:

cuda

I have gone through many forum posts and the NVIDIA documentation, but I couldn't understand what __threadfence() does and how to use it. Could someone explain what the purpose of that intrinsic is?

like image 975
kar Avatar asked Mar 08 '11 12:03

kar


1 Answers

Normally, there are no guarantee that if one block writes something to global memory, the other block will "see" it. There is also no guarantee regarding the ordering of writes to global memory, with an exception of the block that issued it.

There are two exceptions:

  • atomic operations - those are always visible by other blocks
  • threadfence

Imagine, that one block produces some data, and then uses atomic operation to mark a flag that the data is there. But it is possible that the other block, after seeing the flag, still reads incorrect or incomplete data.

The __threadfence function, coming to the rescue, ensures the ordering. All writes before it really happen before all writes after it, as seen from other blocks.

Note that the __threadfence function doesn't necessarily need to stall the current thread until its writes to global memory are visible to all other threads in the grid. Implemented in this naive way, the __threadfence function could hurt performance severely.

As an example, if you do something like:

  1. store your data
  2. __threadfence()
  3. atomically mark a flag

it is guaranteed that if the other block sees the flag, it will also see the data.

Further reading: Cuda Programming Guide, Chapter B.5 (as of version 11.5)

like image 52
CygnusX1 Avatar answered Dec 09 '22 09:12

CygnusX1