I'm developing an Android app using OpenGL ES 2. The problem I am encountering is that the glClear()
function is taking so long to process that the game appears jittery as frames are delayed.
The output of a run of the program with timing probes shows that while setting up all vertices and images from the atlas only takes less than 1 millisecond, glClear()
takes between 10 and 20 milliseconds. In fact, the clearing often takes up to 95% of the total rendering time.
My code is based upon common tutorials, and the Render
function is this:
private void Render(float[] m, short[] indices) {
Log.d("time", "--START RENDER--");
// get handle to vertex shader's vPosition member
int mPositionHandle = GLES20.glGetAttribLocation(riGraphicTools.sp_Image, "vPosition");
// Enable generic vertex attribute array
GLES20.glEnableVertexAttribArray(mPositionHandle);
// Prepare the triangle coordinate data
GLES20.glVertexAttribPointer(mPositionHandle, 3,
GLES20.GL_FLOAT, true,
0, vertexBuffer);
// Get handle to texture coordinates location
int mTexCoordLoc = GLES20.glGetAttribLocation(riGraphicTools.sp_Image, "a_texCoord" );
// Enable generic vertex attribute array
GLES20.glEnableVertexAttribArray ( mTexCoordLoc );
// Prepare the texturecoordinates
GLES20.glVertexAttribPointer ( mTexCoordLoc, 2, GLES20.GL_FLOAT,
false,
0, uvBuffer);
// Get handle to shape's transformation matrix
int mtrxhandle = GLES20.glGetUniformLocation(riGraphicTools.sp_Image, "uMVPMatrix");
// Apply the projection and view transformation
GLES20.glUniformMatrix4fv(mtrxhandle, 1, false, m, 0);
// Get handle to textures locations
int mSamplerLoc = GLES20.glGetUniformLocation (riGraphicTools.sp_Image, "s_texture" );
// Set the sampler texture unit to 0, where we have saved the texture.
GLES20.glUniform1i ( mSamplerLoc, 0);
long clearTime = System.nanoTime();
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT);
Log.d("time", "Clear time is " + (System.nanoTime() - clearTime));
// Draw the triangles
GLES20.glDrawElements(GLES20.GL_TRIANGLES, indices.length,
GLES20.GL_UNSIGNED_SHORT, drawListBuffer);
// Disable vertex array
GLES20.glDisableVertexAttribArray(mPositionHandle);
GLES20.glDisableVertexAttribArray(mTexCoordLoc);
Log.d("time", "--END RENDER--");
}
I have tried moving the png atlas to /drawable-nodpi
but it had no effect.
I have tried using the glFlush()
and glFinish()
functions as well.
Interestingly, if I do not call glClear()
then it must automatically be called. This is because the total rendering time is still as high as when it was called, and there is no remnants of the previous frame onscreen. Only the first call to glClear()
is time-consuming. If it is called again, the subsequent calls are only 1 or 2 milliseconds.
I have also tried different combinations of parameters (such as GLES20.GL_DEPTH_BUFFER_BIT
), and using glClearColor()
. The clear time is still high.
Thank you in advance.
You're not measuring what you think you are. Measuring the elapsed time of an OpenGL API call is mostly meaningless.
The key aspect to understand is that OpenGL is an API to pass work to a GPU. The easiest mental model (which largely corresponds to reality) is that when you make OpenGL API calls, you queue up work that will later be submitted to the GPU. For example, if you make a glDraw*()
call, picture the call building a work item that gets queued up, and at some point later will be submitted to the GPU for execution.
In other words, the API is highly asynchronous. The work you request by making API calls is not completed by the time the call returns. In most cases, it's not even submitted to the GPU for execution yet. It is only queued up, and will be submitted at some point later, mostly outside your control.
A consequence of this general approach is that the time you measure to make a glClear()
call has pretty much nothing to do with how long it takes to clear the framebuffer.
Now that we established how the OpenGL API is asynchronous, the next concept to understand is that a certain level of synchronization is necessary.
Let's look at a workload where the overall throughput is limited by the GPU (either by GPU performance, or because the frame rate is capped by the display refresh). If we kept the whole system entirely asynchronous, and the CPU can produce GPU commands faster than the GPU can process them, we would be queuing up a gradually increasing amount of work. This is undesirable for a couple of reasons:
To avoid this, drivers use throttling mechanisms to prevent the CPU from getting too far ahead. The details of how exactly this is handled can be fairly complex. But as a simple model, it might be something like blocking the CPU when it gets more than 1-2 frames ahead of what the GPU has finished rendering. Ideally, you always want some work queued up so that the GPU never goes idle for graphics limited apps, but you want to keep the amount of queued up work as small as possible to minimize memory usage and latency.
With all this background information explained, your measurements should be much less surprising. By far the most likely scenario is that your glClear()
call triggers a synchronization, and the time you measure is the time it takes the GPU to catch up sufficiently, until it makes sense to submit more work.
Note that this does not mean that all the previously submitted work needs to complete. Let's look at a sequence that is somewhat hypothetical, but realistic enough to illustrate what can happen:
glClear()
call that forms the start of rendering frame n
.n - 3
is on the display, and the GPU is busy processing rendering commands for frame n - 2
.glClear()
call until the GPU finished the rendering commands for frame n - 2
.n - 2
is shown on the display, which means waiting for the next beam sync.n - 2
is on the display, the buffer that previously contained frame n - 3
is not used anymore. It is now ready to be used for frame n
, which means that the glClear()
command for frame n
can now be submitted.Note that while your glClear()
call did all kinds of waiting in this scenario, which you measure as part of the elapsed time spent in the API call, none of this time was used for actually clearing the framebuffer for your frame. You were probably just sitting on some kind of semaphore (or similar synchronization mechanism), waiting for the GPU to complete previously submitted work.
Considering that your measurement is not directly helpful after all, what can you learn from it? Unfortunately not a whole lot.
If you do observe that your frame rate does not meet your target, e.g. because you observe stuttering, or even better because you measure the framerate over a certain time period, the only thing you know for sure is that your rendering is too slow. Going into the details of performance analysis is a topic that is much too big for this format. Just to give you a rough overview of steps you could take:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With