in my fragment shader there are two lines as below:
float depthExp=max(0.5,pow(depth,100.0));
gl_FragColor=vec4(depthExp*vec3(color),1);
i "optimize" it into:
if(depth<0.99309249543703590153321021688807){//0.5^(1/100.0)
gl_FragColor=vec4(0.5*vec3(color),1);
}else{
float depthExp=pow(depth,100.0);
gl_FragColor=vec4(depthExp*vec3(color),1);
}
can i get performance rise by this? or i just do the thing against my will?
i give the complete fragment shader here, see if there is chance to optimize it:
varying vec2 TexCoord;
uniform sampler2D Texture_color;
uniform sampler2D Texture_depth;
uniform sampler2D Texture_stencil;
void main()
{
float depth=texture2D(Texture_depth,TexCoord).r;
float stencil=texture2D(Texture_stencil,TexCoord).r;
vec4 color=texture2D(Texture_color,TexCoord);
if(stencil==0.0){
gl_FragColor=color;
}else{
float depthExp=max(0.5,pow(depth,100.0));
gl_FragColor=vec4(depthExp*vec3(color),1);
}
}
First of all excessive branching in the shader is usually not a good idea. On modern hardware it won't be too bad as long as nearby fragments all take the same branch. But once two fragments of a local packet of fragments (whose size is implementation dependent, probably a small square, say 4x4-8x8) take different branches, the GPU will actually have to execute both branches for each fragment of the packet.
So if nearby fragments are likely to take the same branch it may give some improvement. Since the condition is based on the depth (though from a previous rendering, I guess) and the depth buffer is usually comprised of larger regions with mononotous depth distribution, it is indeed likely for nearby fragments to enter the same branch. And since the optimized branch is executed for most of the fragments (since most will be smaller than 0.993, even more so due to the depth buffer's non-linear nauture and the higher precision at the smaller values) it may be profitable. But like Apeforce suggests, the best idea would be to measure it.
But this brings me to another question. Given that virtually all fragments in a usual scene will have a depth smaller than 0.993, except for the background and most of the values will result in incredibly small numbers after exponentiating with 100 (man, 0.95^100 = 0.005 and 0.9^100 = 0,00002), scaling a color (whose precision and impact on perception is not that high in the first place, anyway) by this amount will most probably just zero it out. So if you indeed have a standard depth buffer with values from [0,1] (and maybe even non-linear as usual), then I doubt what the actual purpose of this pow is anyway and if there is probably a different solution to your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With