Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a texture lookup so much slower than a direct computation?

I'm working on an OpenGL implementation of the oculus Rift distortion shader. The shader works by taking the input texture coordinate (of a texture containing a previously rendered scene) and transforming it using distortion coefficients, and then using the transformed texture to determine the fragment color.

I'd hoped to improve performance by pre-computing the distortion and storing it in a second texture, but the result is actually slower than the direct computation.

The direct calculation version looks basically like this:

float distortionFactor(vec2 point) {
    float rSq = lengthSquared(point);
    float factor =  (K[0] + K[1] * rSq + K[2] * rSq * rSq + K[3] * rSq * rSq * rSq);
    return factor;
}

void main()
{
    vec2 distorted = vRiftTexCoord * distortionFactor(vRiftTexCoord);
    vec2 screenCentered = lensToScreen(distorted);
    vec2 texCoord = screenToTexture(screenCentered);
    vec2 clamped = clamp(texCoord, ZERO, ONE);
    if (!all(equal(texCoord, clamped))) {
        vFragColor = vec4(0.5, 0.0, 0.0, 1.0);
        return;
    }
    vFragColor = texture(Scene, texCoord);
}

where K is a vec4 that's passed in as a uniform.

On the other hand, the displacement map lookup looks like this:

void main() {
    vec2 texCoord = vTexCoord;
    if (Mirror) {
        texCoord.x = 1.0 - texCoord.x;
    }
    texCoord = texture(OffsetMap, texCoord).rg;
    vec2 clamped = clamp(texCoord, ZERO, ONE);
    if (!all(equal(texCoord, clamped))) {
        discard;
    }
    if (Mirror) {
        texCoord.x = 1.0 - texCoord.x;
    }
    FragColor =  texture(Scene, texCoord);
}

There's a couple of other operations for correcting the aspect ratio and accounting for the lens offset, but they're pretty simple. Is it really reasonable to expect this to outperform a simple texture lookup?

like image 642
Jherico Avatar asked Dec 15 '13 06:12

Jherico


2 Answers

GDDR memory is pretty high latency and modern GPU architectures have plenty of number crunching capabilities. It used to be the other way around, GPUs were so ill-equipped to do calculations that normalization was cheaper to do by fetching from a cube map.

Throw in the fact that you are not doing a regular texture lookup here, but rather a dependent lookup and it comes as no surprise. Since the location you are fetching from depends on the result of another fetch, it is impossible to pre-fetch / efficiently cache (an effective latency hiding strategy) the memory needed by your shader. That is no "simple texture lookup."

What is more, in addition to doing a dependent texture lookup your second shader also includes the discard keyword. This will effectively eliminate the possibility of early depth testing on a lot of hardware.

Honestly, I do not see why you want to "optimize" the distortionFactor (...) function into a lookup. It uses squared length, so you are not even dealing with a sqrt, just a bunch of multiplication and addition.

like image 97
Andon M. Coleman Avatar answered Oct 18 '22 03:10

Andon M. Coleman


Andon M. Coleman already explained what's going in. Essentially memory bandwith and more importantly memory latency are the main bottlenecks of modern GPUs, hence everthing built between about 2007 to today simple calculations are often way faster than a texture lookup.

In fact memory access patterns have such a large impact on efficiency that slightly rearranging the access pattern and assuring proper alignment can easily give performance boosts of a factor of 1000 (BT;DT however that was CUDA programming). Dependent lookup is not necessarily a performance killer, though: If the dependent texture coordinate lookup is monotonic with the controller texture it's usually not so bad.


That being said, did you never hear about Horner's Method? You can rewrite

float factor =  (K[0] + K[1] * rSq + K[2] * rSq * rSq + K[3] * rSq * rSq * rSq);

trivially to

float factor =  K[0]  + rSq * (K[1] + rSq * (K[2] + rSq * K[3]) );

Saving you a couple of operations.

like image 6
datenwolf Avatar answered Oct 18 '22 02:10

datenwolf