Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android triple buffering - expected behavior?

I'm investigating the performance of my app since I noticed it dropping some frames while scrolling. I ran systrace (on a Nexus 4 running 4.3) and noticed an interesting section in the output.

Everything is fine at first. Zooming in on the left section, we can see that drawing starts on every vsync, finishes with time to spare, and waits until the next vsync. Since it's triple buffered, it should be drawing into a buffer that will be posted on the following vsync after it's done.

On the 4th vsync in the zoomed in screenshot, the app does some work and the draw operation doesn't finish in time for the next vsync. However, we don't drop any frames because the previous draws were working a frame ahead.

After this happens though, the draw operations don't make up for the missed vsync. Instead, only one draw operation starts per vsync, and now they're not drawing one frame ahead anymore.

Zooming in on the right section, the app does some more work and misses another vsync. Since we weren't drawing a frame ahead, a frame actually gets dropped here. After this, it goes back to drawing one frame ahead.

Is this expected behavior? My understanding was that triple buffering allowed you to recover if you missed a vsync, but this behavior looks like it drops a frame once every two vsyncs you miss.


Follow up questions

  1. On the right side of this screenshot, the app is actually rendering buffers faster than the display is consuming them. During performTraversals #1 (labeled in the screenshot), let's say buffer A is being displayed and buffer B is being rendered. #1 finishes long before the vsync and puts buffer B in the queue. At this point, shouldn't the app be able to immediately start rendering buffer C? Instead, performTraversals #2 doesn't start until the next vsync, wasting the precious time in between.

  2. In a similar vein, I'm a bit confused about the need for waitForever on the left side here. Let's say buffer A is being displayed, buffer B is in the queue, and buffer C is being rendered. When buffer C is finished rendering, why isn't it immediately added to the queue? Instead it does a waitForever until buffer B is removed from the queue, at which point it adds buffer C, which is why the queue seems to always stay at size 1 no matter how fast the app is rendering buffers.

like image 758
peterb_sm Avatar asked Apr 30 '14 00:04

peterb_sm


People also ask

Is triple buffering good for performance?

The short answer is that triple buffering is a technology that can smooth out video game framerate and reduce screen tearing, but it can also cause notable input lag in some situations. Triple buffering also often requires a fairly powerful PC to run properly. That's the abbreviated short-and-sweet.

Does triple buffering increase FPS?

With triple buffering enabled, the game renders a frame in one back buffer. While it is waiting to flip, it can start rendering in the other back buffer. The result is that the frame rate is typically higher than double buffering (and Vsync enabled) without any tearing.

Is triple buffering Vsync?

Triple buffering solves the latency issue of vsync latency by trading off buring additional cpu/gpu time, bringing the worst-case latency back to 1 frame.


1 Answers

The amount of buffering provided only matters if you keep the buffers full. That means rendering faster than the display is consuming them.

The labels don't appear in your images, but I'm guessing that the purple row above the green vsync row is the BufferQueue status. You can see that it generally has 0 or 1 full buffers at any time. At the very left of the "zoomed-in on the left" image you can see that it's got two buffers, but after that it only has one, and 3/4 of the way across the screen you see a very short purple bar the indicates it just barely rendered the frame in time.

See this post and this post for background.

Update for the added questions...

The detail in the other post barely scratched the surface. We must go deeper.

The BufferQueue count shown in systrace is the number of queued buffers, i.e. the number of buffers that have content in them. When SurfaceFlinger grabs a buffer for display, it releases the buffer immediately, changing its state to "free". This is particularly exciting when the buffer is being shown on an overlay, because the display is rendering directly from the buffer (as opposed to compositing into a scratch buffer and displaying that).

Let me say that again: the buffer from which the display is actively reading data for display on the screen is marked as "free" in the BufferQueue. The buffer has an associated fence that is initially "active". While it's active, nobody is allowed to modify the buffer contents. When the display no longer needs the buffer, it signals the fence.

So the reason why the code over on the left of your trace is in waitForever() is because it's waiting for the fence to signal. When VSYNC hits, the display switches to a different buffer, signals the fence, and your app can start using the buffer immediately. This eliminates the latency that would be incurred if you had to wait for SurfaceFlinger to wake up, see that the buffer was no longer in use, send an IPC through BufferQueue to release the buffer, etc.

Note that the calls to waitForever() only show up when you're not falling behind (left side and right side of the trace). I'm not sure offhand why it's happening at all when the queue has only 1 full buffer -- it should be dequeueing the oldest buffer, which should already have signaled.

The bottom line is that you'll never see the BufferQueue go above two for triple buffering.

Not all devices work as described above. Nexus 7 (2012) doesn't use the "explicit sync" mechanism, and pre-ICS devices don't have BufferQueues at all.

Going back to your numbered screenshot, yes, there's plenty of time between '1' and '2' where your app could run performTraversals(). It's hard to say for sure without knowing what your app is doing, but I would guess you've got a Choreographer-driven animation cycle that wakes up every VSYNC and does work. It doesn't run more often than that.

If you systrace Android Breakout you can see what it looks like when you render as fast as you can ("queue stuffing") and rely on BufferQueue back-pressure to regulate the game speed.

It's especially interesting to compare N4 running 4.3 with N4 running 4.4. On 4.3, the trace is similar to yours, with the queue largely hovering at 1, with regular drops to 0 and occasional spikes to 2. On 4.4, the queue is almost always at 2 with an occasional drop to 1. In both cases it's sleeping in eglSwapBuffers(); in 4.3 the trace usually shows waitForever() below that, while in 4.4 it shows dequeueBuffer(). (I don't know the reason for this offhand.)

Update 2: The reason for the difference between 4.3 and 4.4 appears to be a Nexus 4 driver change. The 4.3 driver used the old dequeueBuffer call, which turns into dequeueBuffer_DEPRECATED() (Surface.cpp line 112). The old interface doesn't take the fence as an "out" parameter, so the call has to call waitForever() itself. The newer interface just returns the fence to the GL driver, which does the wait when it needs to (which might not be right away).

Update 3: An even longer explanation is now available here.

like image 65
fadden Avatar answered Oct 25 '22 15:10

fadden