On iOS, I've found that most (all?) devices have a GL_MAX_VARYING_VECTORS of 8. I've also read (see the note here) that even swizzle operations count as dependent texture reads. Together, these restrictions seem to imply that you cannot have a convolution kernel of more than eight elements (at least, not a maximally efficient one).
Is there a way to evaluate a convolution kernel of more than eight elements without incurring dependent texture reads?
EDIT: In case it makes any difference, my kernel is a rotated square:
•
• • •
• • • • •
• • •
•
My current tack is to create two versions of the texture — one offset relative to the other by (1, 1) — and use this kernel:
•
• • •
• • •
•
I don't know whether the doubled-up data flow will outweigh the benefit of avoiding the dependent texture reads. As @TraxNet suggests, I'll probably just have to measure it.
Together, these restrictions seem to imply that you cannot have a convolution kernel of more than eight elements (at least, not a maximally efficient one).
I suppose you mean only eight elements without a dependant read. You can lookup the texture more times by generating new texture coords at the fragment shader (being dependant).
Depending on how spread your lookups are, some of them could be found at texture cache which may mitigate some performance hit. Also, that doesn't mean if you use an uniform ("constant") to displace the texture coords the shader compiler cannot optimize this code path and bring your texture data before the shader executes. But yes, you are correct, without 9 varying vectors you cannot implement a 3x3 convolution at the vertex shader.
At the end you need to measure and decide.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With