I'm implementing an algorithm as something below in compute shader
My work group size settings are layout (local_size_x = 256) in;
AndglDispatchCompute(1, 256, 1);
Before reading temp image in step 2, every pixel requires all its 8 neighbour have finished step 1. So I put a memoryBarrier() between step 1 and step 2, since the OpenGL Programming Guide, 8th Edition
says memory barrier functions apply globally
, not just the same local work group.
But this does not work as expected.
To demonstrate the result, consider a simplified but similar problem,
This should cause the black rectangle become larger and larger. But the result is, the rectangle become out of shape when becoming larger.
So, does memoryBarrier() really wait until all invocations triggered by the same glDispatchCompute call finish their memory access?
After I implement a lock between step 2 and 3, the result works as expected.(but later I found that sometimes it will cause the program crashed because of exceeding the Windows Time-Out limit!http://nvidia.custhelp.com/app/answers/detail/a_id/3007)
(p is current location, p+e[i] are its 8 nearby pixels' location. Instead of image variables, I use shader storage buffer object, so I add a function posi() to convert ivec2 to array index)
bool finished;
do
{
finished = true;
for(int i = 1; i < 9; i++)
{
if(!outOfBound(p+e[i]) && lock[posi(p+e[i])] != 1)
{
finished = false;
}
}
}while(!finished);
If I have misundertand the memoryBarrier() and it can't do what I want, is there any better way to synchronize the invocations of compute shader?
Here is my compute shader code of the black rectangle example described above:
Actually tag is an image used to tell whether the color of the pixel is black or white, it's initialized to a small black rectangle on a white background.
temp is set to zero before I run this compute shader.
The commented code is about the lock described above. With this lock, the shader will give desired output.
#version 430 core
layout (local_size_x = 256) in;
const ivec2 e[9] = {
ivec2(0,0),
ivec2(1,0), ivec2(0,1), ivec2(-1,0), ivec2(0,-1),
ivec2(1,1), ivec2(-1,1), ivec2(-1,-1), ivec2(1,-1)
};
layout(std430, binding = 14) coherent buffer tag_buff
{
int tag[];
};
layout(std430, binding = 15) coherent buffer temp_buff
{
int temp[];
};
layout(std430, binding = 16) coherent buffer lock_buff
{
int lock[];
};
int posi(ivec2 point)
{
return point.y * 256 + point.x;
}
bool outOfBound(ivec2 p)
{
return p.x < 0 || p.x >= 256
|| p.y < 0 || p.y >= 256;
}
void main()
{
ivec2 p = ivec2(gl_GlobalInvocationID.xy);
int x = tag[posi(p)];
temp[posi(p)] = x;
//lock[posi(p)] = 1;
memoryBarrier();
//bool finished;
//do
//{
// finished = true;
// for(int i = 1; i < 9; i++)
// {
// if(!outOfBound(p+e[i]) && lock[posi(p+e[i])] != 1)
// {
// finished = false;
// }
// }
//}while(!finished);
// if it's black or at least one of its 8 nearby pixel is black
// set itself to black
for(int i = 0; i < 9; i++)
{
if(!outOfBound(p+e[i]) && temp[posi(p+e[i])] == 1)
{
tag[posi(p)] = 1;
}
}
}
Later I tried storing lock
into another ssbo after setting its elements to 1 and a memoryBarrier() call, and then load the new ssbo in fragment shader and print it to the screen, from which I found that some element of lock
had not been setted to 1.
I also use image variable instead of ssbo in fragment shader or compute shader, only to find memoryBarrier and coherent can't change anything. It just seems that memoryBarrier or coherent doesn't work.
The memoryBarrier
can't synchronize invocations by synchronizing memory accesses. More specifically, what exactly memoryBarrier
do is just waiting for completion of all memory accesses which have already happened in the invocations. It will not wait for the memory accessing code to finish which have not executed even though it's prior to the memoryBarrier
in the source code. The Opengl programming guide said When memoryBarrier() is called, it ensures that any writes to memory that have been performed by the shader invocation have been committed to memory rather than lingering in caches or being scheduled after the call to memoryBarrier()
. That's means, for example, assuming there are three invocations, if both invocation A and B have runned the imageStore() for a coherent
image variable, then a following memoryBarrier
of A or B will guarantee this imageStore() has changed the data in main memory, not just the cache. But if invocation C has not runned imageStore() when A or B call memoryBarrier
, then this memoryBarrier
call will not wait for C to run its imageStore(). So memoryBarrier
can't help me to implement the algorithm.
I stumbled accross a similar problem. I am no expert but I believe I found a good solution.
You correctly identified memoryBarrier
as necessary to ensure visibility of previous writes.
However, on its own memoryBarrier
is nearly useless because it does not ensure execution ordering. So although you have a memoryBarrier
there could be invocations that are fully finished before others even start to run.
memoryBarrier
can not make writes visible that have not yet happened.
We have barrier
to remedy this:
For any given static instance of barrier in a compute shader, all invocations within a single work group must enter it before any are allowed to continue beyond it.
Note the emphasis: barrier
does not help you synchronize accross work groups within one glDispatchCompute
call, it only synchronizes within work groups.
Obviously, barrier
does not help with your problem,
so you introduced your own barrier which has disadvantages:
If the driver knew about the barrier it could schedule those invocations that did not yet reach the barrier to run. In your solution, the driver blindly schedules all invocations, wasting resources on already waiting ones instead of running those that have not yet reached the barrier.
What to do instead?
To achieve a barrier accross all invocations just do multiple glDispatchCompute
interleaved with appropriate glMemoryBarrier
calls.
The seperation into multiple glDispatchCompute
calls creates the barrier between them.
glMemoryBarrier
makes the writes of previous invocations visible to later ones.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With