Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimize Android GLSurfaceView lag

Following some other questions on Stack Overflow, I've read the guide to the internals of Android Surfaces, SurfaceViews, etc from here:

https://source.android.com/devices/graphics/architecture.html

That guide has given me a much improved understanding of how all the different pieces fit together on Android. It covers how eglSwapBuffers just pushes the rendered frame into a queue which will later be consumed by SurfaceFlinger when it prepares the next frame for display. If the queue is full, then it will wait until a buffer becomes available for the next frame before returning. The document above describes this as "stuffing the queue" and relying on the "back-pressure" of swap buffers to limit the rendering to the vsync of the display. This is what happens using the default continuous render mode of the GLSurfaceView.

If your rendering is simple and completes in much less than the frame period, the negative effect of this is an additional lag caused by the BufferQueue, as the wait on SwapBuffers doesn't happen until the queue is full, and therefore the frame we're rendering is always destined to be at the back of the queue, and so will not be displayed straight away on the next vsync as there are likely buffers before it in the queue.

In contrast rendering-on-demand typically happens much less frequently than the display update rate, so typically the BufferQueues for those views are empty, and therefore any updates pushed into those queues will be grabbed by SurfaceFlinger on the very next vsync.

So here's the question: How can I set up a continuous renderer, but with minimal lag? The goal is that the buffer queue is empty at the start of each vsync, I render my content in under 16ms, push it to the queue (buffer count = 1), and it is then consumed by SurfaceFlinger on the next vsync (buffer count = 0), repeat. The number of Buffers in the queue can be seen in systrace, so the goal is to have this alternate between 0 and 1.

The document I mention above introduces Choreographer as a way to get callbacks on each vsync. However I'm not convinced that is enough to be able to achieve the minimal lag behaviour I'm after. I have tested doing a requestRender() on a vsync callback with a very minimal onDrawFrame() and it does indeed exhibit the 0/1 buffer count behaviour. However what if SurfaceFlinger isn't able to do all of its work within a single frame period (perhaps a notification pops in or whatever)? In that case I expect my renderer will happily be producing 1 frame per vsync, but the consumer end of that BufferQueue has dropped a frame. Result: we're now alternating between 1 and 2 buffers in our queue, and we've gained a frame of lag between doing the rendering and seeing the frame.

The document appears to suggest looking at the time offset between the reported vsync time and when the callback is run. I can see how that can help if your callback is delivered late due to your main thread due to a layout pass or something. However I don't think that would allow detecting of SurfaceFlinger skipping a beat and failing to consume a frame. Is there any way the app can work out that SurfaceFlinger has dropped a frame? It also seems like inability to tell the length of the queue breaks the idea of using the vsync time for game-state updates, as there's an unknown number of frames in the queue before the one you're rendering will actually be displayed.

Reducing the maximum length of the queue and relying on the back-pressure would be one way to achieve this, but I don't think there's an API to set the maximum number of buffers in the GLSurfaceView BufferQueue?

like image 348
tangobravo Avatar asked Oct 11 '14 17:10

tangobravo


1 Answers

Great question.

Quick bit of background for anyone else reading this:

The goal here is to minimize the display latency, i.e. the time between when the app renders a frame and when the display panel lights up the pixels. If you're just throwing content at the screen, it doesn't matter, because the user can't tell the difference. If you're responding to touch input, though, every frame of latency makes your app feel just a bit less responsive.

The problem is similar to A/V sync, where you need audio associated with a frame to come out the speaker as the video frame is being displayed on screen. In that case, the overall latency doesn't matter so long as its consistently equal on both audio and video outputs. This faces very similar problems though, because you'll lose sync if SurfaceFlinger stalls and your video is consistently being displayed one frame later.

SurfaceFlinger runs at elevated priority, and does relatively little work, so isn't likely to miss a beat on its own... but it can happen. Also, it is compositing frames from multiple sources, some of which uses fences to signal asynchronous completion. If an on-time video frame is composed with OpenGL output, and the GLES rendering hasn't completed when the deadline hits, the whole composition will be postponed to the next VSYNC.

The desire to minimize latency was strong enough that the Android KitKat (4.4) release introduced the "DispSync" feature in SurfaceFlinger, which shave half a frame of latency off the usual two-frame delay. (This is briefly mentioned in the graphics architecture doc, but it's not in widespread use.)

So that's the situation. In the past this was less of an issue for video, because 30fps video updates every-other frame. Hiccups work themselves out naturally because we're not trying to keep the queue full. We're starting to see 48Hz and 60Hz video though, so this matters more.

The question is, how do we detect if the frames we send to SurfaceFlinger are being displayed as soon as possible, or are spending an extra frame waiting behind a buffer we sent previously?

The first part of the answer is: you can't. There is no status query or callback on SurfaceFlinger that will tell you what its state is. In theory you could query the BufferQueue itself, but that won't necessarily tell you what you need to know.

The problem with queries and callbacks is that they can't tell you what the state is, only what the state was. By the time the app receives the information and acts on it, the situation may be completely different. The app will be running at normal priority, so it's subject to delays.

For A/V sync it's slightly more complicated, because the app can't know the display characteristics. For example, some displays have "smart panels" that have memory built in to them. (If what's on the screen doesn't update often, you can save a lot of power by not having the panel scan the pixels across the memory bus 60x per second.) These can add an additional frame of latency that must be accounted for.

The solution Android is moving toward for A/V sync is to have the app tell SurfaceFlinger when it wants the frame to be displayed. If SurfaceFlinger misses the deadline, it drops the frame. This was added experimentally in 4.4, though it's not really intended to be used until the next release (it should work well enough in "L preview", though I don't know if that includes all of the pieces required to use it fully).

The way an app uses this is to call the eglPresentationTimeANDROID() extension before eglSwapBuffers(). The argument to the function is the desired presentation time, in nanoseconds, using the same timebase as Choreographer (specifically, Linux CLOCK_MONOTONIC). So for each frame, you take the timestamp you got from the Choreographer, add the desired number of frames multiplied by the approximate refresh rate (which you can get by querying the Display object -- see MiscUtils#getDisplayRefreshNsec()), and pass it to EGL. When you swap buffers, the desired presentation time is passed along with the buffer.

Recall that SurfaceFlinger wakes up once per VSYNC, looks at the collection of pending buffers, and delivers a set to the display hardware via Hardware Composer. If you request display at time T, and SurfaceFlinger believes that a frame passed to the display hardware will display at time T-1 or earlier, the frame will be held (and the previous frame re-shown). If the frame will appear at time T, it will be sent to the display. If the frame will appear at time T+1 or later (i.e. it will miss its deadline), and there's another frame behind it in the queue that is scheduled for a later time (e.g. the frame intended for time T+1), then the frame intended for time T will be dropped.

The solution doesn't perfectly suit your problem. For A/V sync, you need constant latency, not minimum latency. If you look at Grafika's "scheduled swap" activity you can find some code that uses eglPresentationTimeANDROID() in a way similar to what a video player would do. (In its current state it's little more than a "tone generator" for creating systrace output, but the basic pieces are there.) The strategy there is to render a few frames ahead, so SurfaceFlinger never runs dry, but that's exactly wrong for your app.

The presentation-time mechanism does, however, provide a way to drop frames rather than letting them back up. If you happen to know that there are two frames of latency between the time reported by Choreographer and the time when your frame can be displayed, you can use this feature to ensure that frames will be dropped rather than queued if they are too far in the past. The Grafika activity allows you to set the frame rate and requested latency, and then view the results in systrace.

It would be helpful for an app to know how many frames of latency SurfaceFlinger actually has, but there isn't a query for that. (This is somewhat awkward to deal with anyway, as "smart panels" can change modes, thereby changing the display latency; but unless you're working on A/V sync, all you really care about is minimizing the SurfaceFlinger latency.) It's reasonably safe to assume two frames on 4.3+. If it's not two frames, you may have suboptimal performance, but the net effect will be no worse than you would get if you didn't set the presentation time at all.

You could try setting the desired presentation time equal to the Choreographer timestamp; a timestamp in the recent past means "show ASAP". This ensures minimum latency, but can backfire on smoothness. SurfaceFlinger has the two-frame delay because it gives everything in the system enough time to get work done. If your workload is uneven, you'll wobble between single-frame and double-frame latency, and the output will look janky at the transitions. (This was a concern for DispSync, which reduces the total time to 1.5 frames.)

I don't remember when the eglPresentationTimeANDROID() function was added, but on older releases it should be a no-op.

Bottom line: for 'L', and to some extent 4.4, you should be able to get the behavior you want using the EGL extension with two frames of latency. On earlier releases there's no help from the system. If you want to make sure there isn't a buffer in your way, you can deliberately drop a frame every so often to let the buffer queue drain.

Update: one way to avoid queueing up frames is to call eglSwapInterval(0). If you were sending output directly to a display, the call would disable synchronization with VSYNC, un-capping the application's frame rate. When rendering through SurfaceFlinger, this puts the BufferQueue into "async mode", which causes it to drop frames if they're submitted faster than the system can display them.

Note you're still triple-buffered: one buffer is being displayed, one is held by SurfaceFlinger to be displayed on the next flip, and one is being drawn into by the application.

like image 148
fadden Avatar answered Nov 16 '22 18:11

fadden