What is the best method to copy pixels from texture to texture?
I've found some ways to accomplish this. For instance, there's a method glCopyImageSubData()
but my target version is OpenGL 2.1, so I cannot use it. Also, because the performance is very important, glGetTexImage2D()
is not an option. Since I'm handling video frames as texture, I have to make copies about 30~60 times per second.
Available options which I found are next:
You can ignore the cost of creation of fbo because fbo will be created only once.
Please just don't post something like 'it depends. do your benchmark.'. I'm not targeting only one GPU. If it depends, please, please let me know how it depends on what.
Furthermore, because it is very difficult to measure timing of OpenGL calls, what I want to know it not a quantitative result. I need some advices about which method I should avoid.
If you know better method to copy textures, please let me know it too.
Thank you for reading.
Since I didn't know that timer query, I didn't think of benchmarking. Now, I can do my own benchmarks. I've measured tming for each 100 operations and repeated five times. The cost to create FBOs is not included.
- S=source texture, D=destination texture, SF=FBO of S, DF=FBO of D - operation=copying texture to texture - op/s = how many operations for one second(average), larger is better
Create DF and render S to DF using simple passthrough shader
Create SF and use glCopyTexSubImage2D() for D
Create DF and SF and use glBlitFramebuffer()
Create DF and SF and use glCopyPixels()
passthrough shader ~ glCopyTexSubImage2D > glBlitFramebuffer >> glCopyPixels
So, simple passthrough shader shows the best performance to copy textures. glCopyTexSubImage2D is slightly slower than passthrough shader. fbo-blitting is fast enough but worse than shader and glCopyTexSubImage2D. glCopyPixels, from which I didn't expected good result, shows the worst performance as my expectation.
We ultimately ended up going with rendering a quad into the target; when using minimal shaders, lowp precision etc performance difference between the different methods that use the GPU to do the blit is slight, and this approach gives the most flexibility.
However, if you can find a way of avoiding operations that only copy entirely - if you can change an operation that mutates one of your copies into an operation that reads the original, applies the mutation and generates a new copy all in one pass - that will of course be much faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With