Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a single Vec4 multiplication slow down my ogl es 2 fragment shader so much?

I'm writing a 2D OpenGL game for iOS devices. Right now, i'm looking at the performance on iPad (first gen). The game has code paths for ogl 1.1 and 2.0, and I can switch which one is being used with a #define.

When using ogl 2.0, the profiler tells me that my Renderer Utilization % is a fairly steady 100%, and my framerate is about 30 fps. When using ogl 1.1, the profiler tells me that my Renderer Utilization % is ~60% and my framerate is 60 fps.

I want to improve the performance with ogl 2.0, and being fill rate limited, I suspected the fragment shader. Here is the fragment shader that was being used:

precision highp float;
uniform vec4 u_color;
uniform sampler2D u_sampler0;
varying vec2 v_texCoord;

void main()
{
    gl_FragColor = u_color * texture2D( u_sampler0, v_texCoord );
}

You can see that the shader is pretty simple. It's just multiplying the geometry color by the texture color. As an experiment, I removed the multiplication, so that the output color was just the texture color, like this:

precision highp float;
uniform vec4 u_color;
uniform sampler2D u_sampler0;
varying vec2 v_texCoord;

void main()
{
    gl_FragColor = texture2D( u_sampler0, v_texCoord );
}

Profiling the code using this modified shader gave a Renderer Utilization % of ~60% and a framerate of 60fps, the same performance achieved by the ogl 1.1 codepath.

My question:

1) Should a simple Vec4 multiplication in the fragment shader really have this large a negative effect on the performance?

2) I've heard it said that on ogl es 2 devices, the 1.1 functionality is implemented with shaders. Obviously these shaders manage to efficiently achieve the effect i'm going for (blending geom color into texture color). How can I efficiently achieve this effect in my ogl 2 shader?

like image 946
eroy Avatar asked Apr 01 '11 02:04

eroy


1 Answers

The reason your shader is slower than the built-in ES1.1 shaders is that you are multiplying 4 high precision values (Float32), rather than 4 low precision values (fixed point, -2 to 2).

Change uniform vec4 u_color to uniform lowp vec4 u_color and you should see the same performance as ES 1.1.

like image 169
Frogblast Avatar answered Nov 15 '22 09:11

Frogblast