I am developing a large game that streams in level data (including shaders) as you move through the game world. I do not want to have hitches in my frame rate as shaders are compiled/linked or on the first time they are used.
I have my shader compilation and linking working on a separate thread with its own open-gl context. But I have not been able to get the prewarming of the shaders to work on the separate thread (so that there is no performance hit when the shader is first used).
Prewarming is really not mentioned anywhere in the iOS or OpenGL docs. It is however mentioned in the OpenGL ES Analyzer (one of the instruments available when profiling from xcode). In this tool I get a "Shader Compiled Outside of Prewarming Phase" warning each time something is rendered with a shader that has not been used to render something before. The "Extended detail" says this:
"OpenGL ES Analyzer detected a shader compilation that is not part of an initial prewarming phase. Shader compilation can be a time consuming operation. To avoid them, prewarm all shaders used for rendering. To do this, make a prewarming passwhen your application launches and execute a drawing call with each of the shader programs to be used, using any gl state settings the shader program will be used in conjunction with. States such as blending, color mask, logic ops, multisamping, texture formats, and point primitive state can all affect shader compilation."
The term "compilation" is a little confusing here. The vertex and fragment shaders have already been compiled and the program has been linked. But the first time something is rendered with a given OpenGL state it does some more work on the shader to optimize it for that state I guess.
I have code to pre-warm the shaders by rendering a zero sized triangle before it's first use.
If I compile, link and pre-warm the shaders on the main thread with the same Open GL context as the normal rendering then it works. However if I do it on the background thread with its separate Open GL context it does not work (it still gets the Analyzer warning on first use).
So... it could be that prewarming a shader on a separate context has no effect on other contexts. Or it could be that I don't have all the same state set up the separate context. There is a lot of potential Open GL state that might need to be set up. I'm using an offscreen render buffer on the background thread so that could be considered part of the state.
Has anyone succeeded in getting prewarming working on a background thread?
To be honest with you I was quite ignorant on this matter until yesterday though I have been working on my engine optimization for a while. So, first of all, thank you for the tip :).
I have studied since then the shader warming topic and I have not found much around.
I have found a mention the official AMD documentation in a document titled "ATI OpenGL Programming and Optimization Guide":
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=0CEoQFjAF&url=http%3A%2F%2Fdeveloper.amd.com%2Fmedia%2Fgpu_assets%2FATI_OpenGL_Programming_and_Optimization_Guide.pdf&ei=3HIeT_-jKYbf8AOx3o3BDg&usg=AFQjCNFProzLiXf5Aqqs4jZ2jOb4x0pssg&sig2=6YV7SVA97EFglXv_SX5weg
This is an excerpt of which refers to the warming of the shaders:
Quote:
While the R500 natively supports flow control in the fragment shading unit, the R300 and R400 asics does not. Static flow control for the R300 and R400 is emulated by the driver compiling out unused conditionals and unrolling loops based on the set constants. Even though the R500 asics family natively support flow control, the driver will still attempt to compile out static flow conditions enabling it to reorganize shader instructions for better instruction scheduling. The driver will also try to cache away the compiled shader for a specific static flow condition set in anticipation for its reuse. So when writing a fragment program that uses static flow control, it is recommended to “warm” the shader cache by rendering a dummy triangle on the very first frame that uses the common static conditional permutations relevant for the life of the shader.
The best explanation I have found around is the following:
http://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/
Quote:
Incidentally, this is also the reason why you’ll often see a delay the first time you use a new shader or resource; a lot of the creation/compilation work is deferred by the driver and only executed when it’s actually necessary (you wouldn’t believe how much unused crap some apps create!). Graphics programmers know the other side of the story – if you want to make sure something is actually created (as opposed to just having memory reserved), you need to issue a dummy draw call that uses it to “warm it up”. Ugly and annoying, but this has been the case since I first started using 3D hardware in 1999 – meaning, it’s pretty much a fact of life by this point, so get used to it. :)
In this presentation, it is mentioned how the cryteck engined performed it on the far cry engine though it is mostly related to DirectX.
http://www.powershow.com/view/11f2b1-MzUxN/Far_Cry_and_DirectX_flash_ppt_presentation
I hope these links help in some way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With