Apple says in their Best Practices For Shaders to avoid branching if possible, and especially branching on values calculated within the shader. So I replaced some if
statements with the built-in clamp()
function. My question is, are clamp()
, min()
, and max()
likely to be more efficient, or are they merely convenience (i.e. macro) functions that simply expand to if
blocks?
I realize the answer may be implementation dependent. In any case, the functions are obviously cleaner and make plain the intent, which the compiler could do something with.
Historically speaking GPUs have supported per-fragment instructions such as MIN
and MAX
for much longer than they have supported arbitrary conditional branching. One example of this in desktop OpenGL is the GL_ARB_fragment_program
extension (now superseded by GLSL) which explicitly states that it doesn't support branching, but it does provide instructions for MIN
and MAX
as well as some other conditional instructions.
I'd be pretty confident that all GPUs will still have dedicated hardware for these operations given how common min()
, max()
and clamp()
are in shaders. This isn't guaranteed by the specification because an implementation can optimize code however it sees fit, but in the real world you should use GLSL's built-in functions rather than rolling your own.
The only exception would be if your conditional was being used to avoid a large amount of additional fragment processing. At some point the cost of a branch will be less than the cost of running all the code in the branch, but the balance here will be very hardware dependent and you'd have to benchmark to see if it actually helps in your application on its target hardware. Here's the kind of thing I mean:
void main() {
vec3 N = ...;
vec3 L = ...;
float NDotL = dot(N, L);
if (NDotL > 0.0)
{
// Lots of very intensive code for an awesome shadowing algorithm that we
// want to avoid wasting time on if the fragment is facing away from the light
}
}
Just clamping NDotL
to 0-1 and then always processing the shadow code on every fragment only to multiply through your final shadow term by NDotL
is a lot of wasted effort if NDotL
was originally <= 0, and we can theoretically avoid this overhead with a branch. The reason this kind of thing is not always a performance win is that it is very dependent on how the hardware implements shader branching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With