I can define a shared data structure (for example an array):
shared float [gl_WorkGroupSize.x]
for each workgroup. Execution order inside a workgroup is undefined so at some point I may need to synchronize all threads which use a shared array, for example all threads have to write some data to the shared array before calculations. I found two ways to achieve this:
OpenGL SuperBible:
barrier();
memoryBarrierShared();
OpenGL 4 Shading Language Cookbook:
barrier();
Should I call memoryBarrierShared after barrier ? Could you give me some practical examples when I can use memoryBarrierShared or memoryBarrier without using barrier ?
Memory barriers ensure visibility in otherwise incoherent memory access.
What this really means is that an invocation of your compute shader will not be allowed to attempt some sort of optimization that would read and/or write cached memory.
Writing to something like a Shader Storage Buffer is an example of ordinarily incoherent memory access, without a memory barrier changes made in one invocation are only guaranteed to be visible within that invocation. Other invocations are allowed to maintain their own cached view of the memory unless you tell the GLSL compiler to enforce coherent memory access and where to do so (memoryBarrier* ()
).
There is a serious caveat here, and that is that visibility is only half of the equation. Forcing coherent memory access when the shader is compiled does nothing to solve actual execution order issues across threads in a workgroup. To make sure that all executions in a workgroup have finished processing up to a certain point in your shader, you must use barrier ()
.
#version 450
layout (local_size_x = 128) in;
shared float foobar [128]; // shared implies coherent
void main (void)
{
foobar [gl_LocalInvocationIndex] = 0.0;
memoryBarrierShared (); // Ensure change to foobar is visible in other invocations
barrier (); // Stall until every thread is finished clearing foobar
// At this point, _every_ index (0-127) of `foobar` will have the value **0.0**.
// Without the barrier, and just the memory barrier, the contents of everything
// but foobar [gl_LocalInvocationIndex] would be undefined at this point.
}
Outside of GLSL, there are also barriers at the GL command level (glMemoryBarrier (...)
). You would use those in situations where you need a compute shader to finish executing before GL is allowed to do something that depends on its results.
In the traditional render pipeline GL can implicitly figure out which commands must wait for others to finish (e.g. glReadPixels (...)
stalls until all commands finish writing to the framebuffer). However, with compute shaders and image load/store, implicit synchronization no longer works and you have to tell GL which pipeline memory operations must be finished and visible to the next command.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With