Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GLSL memoryBarrierShared() usefulness?

I am wondering about the usefulness of memoryBarrierShared.

Indeed, when I am looking the documentation for barrier function : I read :

For any given static instance of barrier in a compute shader, all invocations within a single work group must enter it before any are allowed to continue beyond it. This ensures that values written by one invocation prior to a given static instance of barrier can be safely read by other invocations after their call to the same static instance of barrier. Because invocations may execute in undefined order between these barrier calls, the values of a per-vertex or per-patch output variable, or any shared variable will be undefined in a number of cases.

So, if we can safely read values after using barrier, why do we see in some code

memoryBarrierShared();
barrier();

or something wrong like

barrier();
memoryBarrierShared();

So, my question is : What is the purpose of memoryBarrier{Shared,...} if using barrier is enough?

For memoryBarrierBuffer/Image I can understand if we use multiple stage, but for shared, I don't have any idea...

like image 782
Antoine Morrier Avatar asked Sep 08 '16 14:09

Antoine Morrier


1 Answers

Update (2019-12-07):

The GLSL 4.60 clarification below is now wrong. After Revision 5, the GLSL 4.60 spec now reads:

Private GLSL issue #24: Clarify that barrier() by itself is enough to synchronize both control flow and memory accesses to shared variables and tessellation control output variables. For other memory accesses an additional memory barrier is still required.

This is also mirrored by the GLSL ES 3.20 documentation:

In order to achieve ordering with respect to reads and writes to shared variables, control flow barriers must be employed using the barrier() function (see “Shader Invocation Control Functions”).

They also go a bit further and explain

A barrier() affects control flow but only synchronizes memory accesses to shared variables and tessellation control output variables. For other memory accesses, it does not ensure that values written by one invocation prior to a given static instance of barrier() can be safely read by other invocations after their call to the same static instance of barrier(). To achieve this requires the use of both barrier() and a memory barrier.

TL;DR: If you are only using barriers for shared variables, barrier() is sufficient. If you are using them for "other memory accesses", then barrier() is not sufficient.


GLSL 4.60 clarifies this:

In order to achieve ordering with respect to reads and writes to shared variables, a combination of control flow and memory barriers must be employed using the barrier() and memoryBarrier() functions (see “Shader Invocation Control Functions”).

It's probably best to treat desktop GLSL as though it always said this. Even though the following is how it was stated in GLSL 4.50.


GLSL 4.50 makes it abundantly clear that explicit memory barriers are unnecessary. That barrier in a compute shader includes all memory barriers.

However, GLSL ES 3.20 makes it equally abundantly clear that barrier does not include memory barriers of any kind:

For compute shaders, a barrier only affects control flow and does not by itself synchronize memory accesses. In particular, it does not ensure that values written by one invocation prior to a given static instance of barrier() can be safely read by other invocations after their call to the same static instance of barrier(). To achieve this requires the use of both barrier() and a memory barrier.

Notably the offline glslang compiler will always use the GLSL ES wording. So if you're generating SPIR-V to feed into Vulkan, you have to follow ES's rules here. Well, until they get that fixed, one way or another.

That being said, ES's wording makes a lot more sense, as a full memory barrier for everything is quite expensive. Especially if all you want to do is synchronize access to shared variables.

I would suggest using the memory barrier alongside the barrier call. That way, your shader will be correct, even if it may be slightly slower on some implementations. However, if you are going to use memory barriers along with barrier calls, then the memory barrier must come first. Executing the memory barrier after synchronizing execution is not correct.

like image 132
Nicol Bolas Avatar answered Oct 16 '22 11:10

Nicol Bolas