While trying to implement a naive Compute Shader that assigns affecting lights to a cluster, i have encountered an unexpected(well for a noob like me) behavior:
I invoke this shader with glDispatchCompute(32, 32, 32); and it supposed to write a [light counter + 8 indices] for each invocation into "indices" buffer. But while debugging, i found that my writes into that buffer overlap between invocations even though I use unique clusterId. I detect it by values of indices[outIndexStart] going over 8 and visual flickering.
According to documentation, gl_GlobalInvocationID is gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID. But if set all local sizes to 1, write issues go away. Why local_size affects this code in such a way? And how can i reason about choosing it's value here?
#version 430
layout (local_size_x = 4, local_size_y = 4, local_size_z = 4) in;
uniform int lightCount;
const unsigned int clusterSize = 32;
const unsigned int clusterSquared = clusterSize * clusterSize;
struct LightInfo {
vec4 color;
vec3 position;
float radius;
};
layout(std430, binding = 0) buffer occupancyGrid {
int exists[];
};
layout(std430, binding = 2) buffer lightInfos
{
LightInfo lights [];
};
layout(std430, binding = 1) buffer outputList {
int indices[];
};
void main(){
unsigned int clusterId = gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * clusterSize + gl_GlobalInvocationID.z * clusterSquared;
if(exists[clusterId] == 0)
return;
//... not so relevant calculations
unsigned int outIndexStart = clusterId * 9;
unsigned int outOffset = 1;
for(int i = 0; i < lightCount && outOffset < 9; i++){
if(distance(lights[i].position, wordSpace.xyz) < lights[i].radius) {
indices[outIndexStart + outOffset] = i;
indices[outIndexStart]++;
outOffset++;
}
}
}
Let's look at two declarations:
layout (local_size_x = 4, local_size_y = 4, local_size_z = 4) in;
and
const unsigned int clusterSize = 32;
These say different things. The local_size
declaration says that each work group will have 4x4x4 invocations, which is 64. By contrast, your clusterSize
says that each work group will only have 32 invocations.
If you want to fix this problem, use the actual local size constant provided by the system:
const unsigned int clusterSize = gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z;
And you can even do this:
const uvec3 linearizeInvocation = uvec3{1, clusterSize, clusterSize * clusterSize};
...
unsigned int clusterId = dot(gl_GlobalInvocationID, linearizeInvocation);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With