I want to use two PBOs to read pixel in alternative way. I thought the PBO way will much faster, because glReadPixels returns immediately when using PBO, and a lot of time can be overlapped.
Strangely there seems to be not much benefit. Considering some code like:
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);
Timer t; t.start();
glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, buf);
t.stop(); std::cout << t.getElapsedTimeInMilliSec() << " ";
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo);
t.start();
glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, 0);
t.stop(); std::cout << t.getElapsedTimeInMilliSec() << std::endl;
The result is
1.301 1.185
1.294 1.19
1.28 1.191
1.341 1.254
1.327 1.201
1.304 1.19
1.352 1.235
The PBO way is a little faster, but not a satisfactory immediate-return。
My question is:
===========================================================================
According to comparison with a demo, I found two factors:
Then another two questions:
I do not know glutInitDisplayMode
by heart, but this typically is because your internal and external format do not match. For example, you won't notice the asynchronous behaviour when the number of components do not match because this conversion still blocks the glReadPixels
.
So the most likely issue is that with glutInitDisplay(GLUT_RGBA)
you will actually create a default framebuffer with an internal format that's actually RGB
or even BGR
. passing the GLUT_ALPHA
parameter is likely to make it RGBA
or BGRA
internally, which matches the number of components you want.
edit: I found an nvidia document explaining some issues about pixel packing and performance influence.
edit2: The performance gain of BGRA
is likely because the internal hw buffer is in BGRA
, there's not really much more to it.
BGRA is the fastest since this is the native format on modern GPUs. RGBA, RGB and BGR need 'reformatting' during readback.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With