I saw different opinions. For now, I only concern about color data.
In Chapter 28. Graphics Pipeline Performance, it says:
Avoid extraneous color-buffer clears. If every pixel is guaranteed to be overwritten in the frame buffer by your application, then avoid clearing color, because it costs precious bandwidth.
In How does glClear() improve performance?, it quotes from Apple's Technical Q&A on addressing flickering (QA1650):
You must provide a color to every pixel on the screen. At the beginning of your drawing code, it is a good idea to use glClear() to initialize the color buffer. A full-screen clear of each of your color, depth, and stencil buffers (if you're using them) at the start of a frame can also generally improve your application's performance.
And one answer in that post:
By issuing a glClear command, you are telling the hardware that you do not need previous buffer content, thus it does not need to copy the color/depth/whatever from the framebuffer to the smaller tile memory.
To that answer, my question is: If there is no blending, why do we need to read color data from the framebuffer. (For now, I only concern about color data)
But anyway, in general, do I need to call glClear(GL_COLOR_BUFFER_BIT)
In Chapter 28. Graphics Pipeline Performance, it says:
There are a lot of different kinds of hardware. On hardware that was prevalent when GPU Gems #1 was printed, this advice was sound. Nowadays it no longer is.
Once upon a time, clearing buffers actually meant that the hardware would go to each pixel and write the clear value. This process obviously took a non-trivial amount of GPU time, so high-performance application developers did their best to avoid incurring the wrath of the clear operation.
Nowadays (and by which, I mean pretty much any GPU made in the last 8-10 years at least), graphics chips are smarter about clears. Instead of doing a clear, they play games with the framebuffer's caches.
The value a framebuffer image is cleared to matters when doing read/modify/write operations. This includes blending and such, but it also includes any form of depth or stencil testing. In order to do a RMF operation, you must first read the value that's there.
This is where the cleverness comes in. When you "clear" a framebuffer image, nothing gets written. Instead, the framebuffer images address space is invalidated. When a read operation happens to an invalidated address, it simply returns the clear value. This costs zero bandwidth. Indeed, it saves bandwidth, because the read operation doesn't actually have to read memory. It just fetches a clear value.
Depending on how the cache works, this may even be faster when doing pure write operations. But that rather depends on different hardware.
For mobile hardware that uses tile-based rendering, this matters even more. Before a tile can begin processing, it has to read the current values of the framebuffer images. If the images are cleared, it doesn't need to read anything; it simply sets the tile memory to the clear color.
This case matters a lot even if you're not blending to the framebuffer. Why? Because neither the GPU nor the API knows that you won't be blending. It only knows that you're going to perform some number of rendering operations to that image. So it must assume the worst and read the image into the tiles. Unless you cleared it beforehand, of course.
In short, when using those images for framebuffers, clearing the images first is generally no slower than not clearing the images.
The above all assumes that you clear the entire image. If you're only clearing a sub-region of the image, then such optimizations are less likely to happen. Though it may still be possible, at least for the optimizations that are based on cache behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With