Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance hit from blending large quad

I have a game which runs pretty well (55-60fps) on a retina display. I want to add a fullscreen overlay that blends with the existing scene. However, even when using a small texture, the performance hit is huge. Is there an optimization I can perform to make this useable?

If I use a 80x120 texture (the texture is rendered on the fly, which is why it's not square), I get 25-30FPS. If I make the texture smaller, performance increases, but quality is not acceptable. In general, though, the quality of the overlay is not very important (it's just lighting).

Renderer utilization is at 99%.

Even if I use a square texture from a file (.png), performance is bad.

This is how I create the texture:

    [EAGLContext setCurrentContext:context];

    // Create default framebuffer object.
    glGenFramebuffers(1, &lightFramebuffer);
    glBindFramebuffer(GL_FRAMEBUFFER, lightFramebuffer);

    // Create color render buffer and allocate backing store.
    glGenRenderbuffers(1, &lightRenderbuffer);
    glBindRenderbuffer(GL_RENDERBUFFER, lightRenderbuffer);
    glRenderbufferStorage(GL_RENDERBUFFER, GL_RGBA8_OES, LIGHT_WIDTH, LIGHT_HEIGHT);

    glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, lightRenderbuffer);

    glGenTextures(1, &lightImage);
    glBindTexture(GL_TEXTURE_2D, lightImage);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, LIGHT_WIDTH, LIGHT_HEIGHT, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);

    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, lightImage, 0);

And here is the rendering...

/* Draw scene... */

glBlendFunc(GL_ONE, GL_ONE);


//Switch to offscreen texture buffer
glBindFramebuffer(GL_FRAMEBUFFER, lightFramebuffer);
glBindRenderbuffer(GL_RENDERBUFFER, lightRenderbuffer);
glViewport(0, 0, LIGHT_WIDTH, LIGHT_HEIGHT);

glClearColor(ambientLight, ambientLight, ambientLight, ambientLight);
glClear(GL_COLOR_BUFFER_BIT);

/* Draw lights to texture... */

//Switch back to main frame buffer
glBindFramebuffer(GL_FRAMEBUFFER, defaultFramebuffer);
glBindRenderbuffer(GL_RENDERBUFFER, colorRenderbuffer);
glViewport(0, 0, framebufferWidth, framebufferHeight);  

glBlendFunc(GL_DST_COLOR, GL_ZERO);

glBindTexture(GL_TEXTURE_2D, glview.lightImage);    

/* Set up drawing... */

glDrawElements(GL_TRIANGLE_FAN, 4, GL_UNSIGNED_SHORT, 0);

Here are some benchmarks I took when trying to narrow down the problem. 'No blend' means I glDisable(GL_BLEND) before I draw the quad. 'No buffer switching' means I don't switch back and forth from the offscreen buffer before drawing.

(Tests using a static 256x256 .png)
No blend, No buffer switching: 52FPS
Yes blend, No buffer switching: 29FPS //disabled the glClear, which would artificially speed up the rendering
No blend, Yes buffer switching: 29FPS
Yes blend, Yes buffer switching: 27FPS

Yes buffer switching, No drawing: 46FPS

Any help is appreciated. Thanks!

UPDATE

Instead of blending the whole lightmap afterward, I ended up writing a shader to do the work on the fly. Each fragment samples and blends from the lightmap (kind of like multitexturing). At first, the performance gain was minimal, but then I used a lowp sampler2d for the light map, and then I got around 45FPS.

Here's the fragment shader:

lowp vec4 texColor = texture2D(tex, texCoordsVarying);
lowp vec4 lightColor = texture2D(lightMap, worldPosVarying);
lightColor.rgb *= lightColor.a;
lightColor.a = 1.0;

gl_FragColor = texColor * color * lightColor;
like image 490
whooops Avatar asked Nov 20 '11 23:11

whooops


2 Answers

Ok I think you've run up against the limitations of the hardware. Blending a screen-sized quad over the whole scene is probably a particularly bad case for the tile-based hardware. The PowerVR SGX (on the iPhone) is optimized for hidden surface removal, to avoid drawing things when not needed. It has low memory bandwidth because it's optimized for low power device.

So screen-sized blended quad is reading then writing every fragment on the screen. Ouch!

The glClear speed up is related - because you're telling GL you don't care about the contents of the backbuffer before rendering, which saves loading the previous contents into memory.

There's a very good overview of the iOS hardware here: http://www.imgtec.com/factsheets/SDK/POWERVR%20SGX.OpenGL%20ES%202.0%20Application%20Development%20Recommendations.1.1f.External.pdf

As for an actual solution - I would try directly rendering your overlay on the game scene.

For example, your render loop should look like:

[EAGLContext setCurrentContext:context];

// Set up game view port and render the game
InitGameViewPort();
GameRender();

// Change camera to 2d/orthographic, turn off depth write and compare
InitOverlayViewPort()

// Render overlay into same buffer 
OverlayRender()
like image 167
Justicle Avatar answered Nov 08 '22 09:11

Justicle


If you render to a render target on a PowerVR chip, switch to another render target and render, then switch back to any previous render target you will suffer a major performance hit. This kind of access pattern is labelled a "Logical Buffer Load" by the OpenGL ES Analyzer built into the latest Instruments.

If you switch your rendering order so that you draw your lightmap render target first, then render your scene to the main framebuffer, then do your fullscreen blend of the lightmap render target texture your performance should be much higher.

like image 32
Justin Larrabee Avatar answered Nov 08 '22 09:11

Justin Larrabee