I'm interested in information about the speed of sin()
and cos()
in Open GL Shader Language.
The GLSL Specification Document indicates that:
The built-in functions basically fall into three categories:
- ...
- ...
- They represent an operation graphics hardware is likely to accelerate at some point. The trigonometry functions fall into this category.
EDIT:
As has been pointed out, counting clock cycles of individual operations like sin()
and cos()
doesn't really tell the whole performance story.
So to clarify my question, what I'm really interested in is whether it's worthwhile to optimize away sin()
and cos()
calls for common cases.
For example, in my application it'll be very common for the argument to be 0
. So does something like this make sense:
float sina, cosa;
if ( rotation == 0 )
{
sina = 0;
cosa = 1;
}
else
{
sina = sin( rotation );
cosa = cos( rotation );
}
Or will the GLSL
compiler or the sin()
and cos()
implementations take care of optimizations like that for me?
A GLSL fragment shader controls the entire behavior of the GPU between the rasterizer and the blending hardware. That shader does all the work to compute a color, and the color it generates is exactly what is fed to the blending stage of the pipeline.
In GLSL, you apply modifiers (qualifiers) to a global shader variable declaration to give that variable a specific behavior in your shaders. In HLSL, you don't need these modifiers because you define the flow of the shader with the arguments that you pass to your shader and that you return from your shader.
The OpenGL Shading Language (GLSL) is the principal shading language for OpenGL.
Blender supports vertex and fragment shaders in GLSL (i.e. “GLSL programs”; not to be confused with the built-in “GLSL material” or “GLSL shading”).
For example, in my application it'll be very common for the argument to be 0. So does something like this make sense:
No.
Your compiler will do one of two things.
In general, it's not a good idea to use conditional logic to dance around small performance like this. It needs to be really big to be worthwhile, like a discard
or something.
Also, do note that floating-point equivalence is not likely to work. Not unless you actually pass a uniform or vertex attribute containing exactly 0.0 to the shader. Even interpolating between 0 and non-zero will likely never produce exactly 0 for any fragment.
This is a good question. I too wondered this.
Google'd links say cos
and sin
are single-cycle on mainstream cards since 2005 or so.
You'd have to test this out yourself, but I'm pretty sure that branching in a shader is far more expensive than a sin
or cos
calculation. GLSL compilers are pretty good about optimizing shaders, worrying about this is premature optimization. If you later find that, through your entire program, your shaders are the bottleneck, then you can worry about optimizing this.
If you want to take a look at the assembly code of your shader for a specific platform, I would recommend AMD GPU ShaderAnalyzer.
Not sure if this answers your question, but it's very difficult to tell you how many clocks/slots an instruction takes as it depends very much on the GPU. Usually it's a single cycle. But even if not, the compiler may rearrange the order of instruction execution to hide the true cost. It's certainly slower to use texture lookups for sin/cos as it is to execute the instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With