In an attempt to improve performance of display of an object which is very large (and filling up GPU ram), after some reasonably light maths, I discovered I have an opertunity to compress my vertex data down from 16-byte vertices down to 4 byte vertices (since the data could be conceptually be thought of as a mearly a transformed height map - impliying x and y location from the vertex id), where I can tightly pack the Z coordinate into, say, 30 bits, leaving 2 bits for a colour pallet index. That's the idea anyway. My question isn't with the coordinate packing, it's with the colour packing.
The colour pallet will be chosen by the c++ code that loads the model. Since it also loads the shader, I'm currently trying to write the colour lookup code as a switch statement, ie:
int colourIndex = (compressedVertex & Mask) >> bitOffset;
switch (colourIndex)
{
case 0: return vec4(....);
case 1: return vec4(....);
case 2: return vec4(....);
case 3: return vec4(....);
}
Where the model has more colours then 4, I'm comfortable sacrificing bits of height precision in order to fit more bits of colour pallet in (up to a point anyway). My measurements shows that using a switch statement for binding a 4 colour pallet is no slower then binding a 4 pixel 1D texture and using a sampler to read from it.
I've scaled this up to 32 colours so far, and it seems at least as fast as using a texture.
When is a good line in the sand to stop using switch and start using a texture for a lookup table? If It helps the application I'm developing for has an already enforced minimum requirement of OpenGl 3.3. Once the data is on the card it'll never be changed. Can I crank it up to 256 case statements? 1024? 32768? Where's the limit?
(Pre-emptive response: Yes I could continue experimenting and pick a value that works for me on my single, modern card using trial and error and some interpolating; but I'm interested in a more general idea of what is best practice and whether anyone else has tried something similar and knows it to work out in the wild?)
I avoid branching as much as possible in shaders. My advice is to use a texture to do the lookup.
You ask:
Can I crank it up to 256 case statements? 1024? 32768? Where's the limit?
and you say:
I've scaled this up to 32 colours so far, and it seems at least as fast as using a texture.
OpenGL thrives at looking up textures. It's designed to do that. It's not designed for a gigantic switch case statement. And as the commenters say it won't perform well across the board. A 64x64 pixel texture can give you 4096 lookups and in the long run, in my opinion, it's going to be faster over a larger number of lookups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With