Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenGL ES best practices for conditionals

Apple says in their Best Practices For Shaders to avoid branching if possible, and especially branching on values calculated within the shader. So I replaced some if statements with the built-in clamp() function. My question is, are clamp(), min(), and max() likely to be more efficient, or are they merely convenience (i.e. macro) functions that simply expand to if blocks?

I realize the answer may be implementation dependent. In any case, the functions are obviously cleaner and make plain the intent, which the compiler could do something with.

like image 837
David Gish Avatar asked Jul 25 '13 08:07

David Gish


1 Answers

Historically speaking GPUs have supported per-fragment instructions such as MIN and MAX for much longer than they have supported arbitrary conditional branching. One example of this in desktop OpenGL is the GL_ARB_fragment_program extension (now superseded by GLSL) which explicitly states that it doesn't support branching, but it does provide instructions for MIN and MAX as well as some other conditional instructions.

I'd be pretty confident that all GPUs will still have dedicated hardware for these operations given how common min(), max() and clamp() are in shaders. This isn't guaranteed by the specification because an implementation can optimize code however it sees fit, but in the real world you should use GLSL's built-in functions rather than rolling your own.

The only exception would be if your conditional was being used to avoid a large amount of additional fragment processing. At some point the cost of a branch will be less than the cost of running all the code in the branch, but the balance here will be very hardware dependent and you'd have to benchmark to see if it actually helps in your application on its target hardware. Here's the kind of thing I mean:

void main() {
    vec3 N = ...;
    vec3 L = ...;
    float NDotL = dot(N, L);
    if (NDotL > 0.0)
    {
        // Lots of very intensive code for an awesome shadowing algorithm that we
        // want to avoid wasting time on if the fragment is facing away from the light
    }
}

Just clamping NDotL to 0-1 and then always processing the shadow code on every fragment only to multiply through your final shadow term by NDotL is a lot of wasted effort if NDotL was originally <= 0, and we can theoretically avoid this overhead with a branch. The reason this kind of thing is not always a performance win is that it is very dependent on how the hardware implements shader branching.

like image 59
Richard Viney Avatar answered Sep 18 '22 23:09

Richard Viney