Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iPhone openGLES performance tuning

I'm trying now for quite a while to optimize the framerate of my game without really making progress. I'm running on the newest iPhone SDK and have a iPhone 3G 3.1.2 device.

I invoke arround 150 drawcalls, rendering about 1900 Triangles in total (all objects are textured using two texturelayers and multitexturing. most textures come from the same textureAtlasTexture stored in pvrtc 2bpp compressed texture). This renders on my phone at arround 30 fps, which appears to me to be way too low for only 1900 triangles.

I tried many things to optimize the performance, including batching together the objects, transforming the vertices on the CPU and rendering them in a single drawcall. this yelds 8 drawcalls (as oposed to 150 drawcalls), but performance is about the same (fps drop to arround 26fps)

I'm using 32byte vertices stored in an interleaved array (12bytes position, 12bytes normals, 8bytes uv). I'm rendering triangleLists and the vertices are ordered in TriStrip order.

I did some profiling but I don't really know how to interprete it.

  1. instruments-sampling using Instruments and Sampling yelds this result: http://neo.cycovery.com/instruments_sampling.gif telling me that a lot of time is spent in "mach_msg_trap". I googled for it and it seems this function is called in order to wait for some other things. But wait for what??

  2. instruments-openGL instruments with the openGL module yelds this result: http://neo.cycovery.com/intstruments_openglES_debug.gif but here i have really no idea what those numbers are telling me

  3. shark profiling: profiling with shark didn't tell me much either: http://neo.cycovery.com/shark_profile_release.gif the largest number is 10%, spent by DrawTriangles - and the whole rest is spent in very small percentage functions

Can anyone tell me what else I could do in order to figure out the bottleneck and could help me to interprete those profiling information?

Thanks a lot!

like image 200
genesys Avatar asked Jan 16 '10 00:01

genesys


1 Answers

You’re probably CPU-bound. The tiler/renderer utilization statistics in the OpenGL ES instrument show that the duty cycle of the GPU is between 20-30% for rendering at 20-30 fps, which suggests that the GPU could run at 60 fps if fed fast enough. It looks like there are a few things that you could do to get more information out of Instruments and Shark about what to pursue:

By default, Sampler shows every sample from every thread, which means that mostly-idle helper threads created by system frameworks will dominate your view. To get a better idea of what the CPU is actually doing, make sure the Detail View is showing (third button from the left in the lower left corner) and change Sample Perspective to Running Sample Times to exclude samples where a thread is idle/blocked.

I don’t see any samples in the Shark trace from your app itself. That may well be because your code is fast enough that it doesn’t appear anywhere in the list of hot functions, but it might also be because Shark can’t find symbols for your application. You might need to configure the search paths in its preferences or manually point Shark at your app binary. Also, Shark defaults to showing a list of functions ordered by how much CPU time is spent in them. It may be useful to change the view to something more like a regular call tree, so you can visualize how your overall render loop spends its time. To do this, change the View option in the lower-right corner to “Tree (Top-Down).” (If you don’t see your app name or functions here either, then Shark is definitely missing your symbols.)

like image 82
Pivot Avatar answered Sep 28 '22 10:09

Pivot