I was recently looking for ray tracing via opengl tutorials. Most of tutorials prefer compute shaders. I wonder why don't they just render to texture, then render the texture to screen as quad. What is the advantages and disadvantages of compute shader method over screen quad?

Short answer: because compute shaders give you more effective tools to perform complex computations. Long answer: Perhaps the biggest advantage that they afford (in the case of tracing) is the ability to control exactly how work is executed on the GPU. This is important when you're tracing a complex scene. If your scene is trivial (e.g., Cornell Box), then the difference is negligible. Trace some spheres in your fragment shader all day long. Check http://shadertoy.com/ to witness the madness that can be achieved with modern GPUs and fragment shaders. But. If your scene and shading are quite complex, you need to control how work is done. Rendering a quad and doing the tracing in a frag shader is going to, at best, make your application hang while the driver cries, changes its legal name, and moves to the other side of the world...and at worst, crash the driver. Many drivers will abort if a single operation takes too long (which virtually never happens under standard usage, but will happen awfully quickly when you start trying to trace 1M poly scenes). So you're doing too much work in the frag shader...next logical though? Ok, limit the workload. Draw smaller quads to control how much of the screen you're tracing at once. Or use glScissor. Make the workload smaller and smaller until your driver can handle it. Guess what we've just re-invented? Compute shader work groups! Work groups are compute shader's mechanism for controlling job size, and they're a far better abstraction for doing so than fragment-level hackery (when we're dealing with this kind of complex task). Now we can very naturally control how many rays we dispatch, and we can do so without being tightly-coupled to screen-space. For a simple tracer, that adds unnecessary complexity. For a 'real' one, it means that we can easily do sub-pixel raycasting on a jittered grid for AA, huge numbers of raycasts per pixel for pathtracing if we so desire, etc. Other features of compute shaders that are useful for performant, industrial-strength tracers: <ul> <li> Shared Memory between thread groups (allows, for example, packet tracing, wherein an entire packet of spatially-coherent rays are traced at the same time to exploit memory coherence & the ability to communicate with nearby rays)</li> <li> Scatter Writes allow compute shaders to write to arbitrary image locations (note: image and texture are different in subtle ways, but the advantage remains relevant); you no longer have to trace directly from a known pixel location</li> </ul> In general, the architecture of modern GPUs are designed to support this kind of task more naturally using compute. Personally, I have written a real-time progressive path tracer using MLT, kd-tree acceleration, and a number of other computationally-expensive techniques (PT is already extremely expensive). I tried to remain in a fragment shader / full-screen quad as long as I could. Once my scene was complex enough to require an acceleration structure, my driver started choking no matter what hackery I pulled. I re-implemented in CUDA (not quite the same as compute, but leveraging the same fundamental GPU architectural advances), and all was well with the world. If you really want to dig in, have a glance at section 3.1 here: https://graphics.cg.uni-saarland.de/fileadmin/cguds/papers/2007/guenther_07_BVHonGPU/Guenter_et_al._-_Realtime_Ray_Tracing_on_GPU_with_BVH-based_Packet_Traversal.pdf. Frankly the best answer to this question would be an extensive discussion of GPU micro-architecture, and I'm not at all qualified to give that. Looking at modern GPU tracing papers like the one above will give you a sense of how deep the performance considerations go. One last note: any performance advantage of compute over frag in the context of raytracing a complex scene has absolutely nothing to do with rasterization / vertex shader overhead / blending operation overhead, etc. For a complex scene with complex shading, bottlenecks are entirely in the tracing computations, which, as discussed, compute shaders have tools for implementing more efficiently.

Ray tracing via Compute Shader vs Screen Quad

1 Answers

Short answer: because compute shaders give you more effective tools to perform complex computations.

Long answer:

Perhaps the biggest advantage that they afford (in the case of tracing) is the ability to control exactly how work is executed on the GPU. This is important when you're tracing a complex scene. If your scene is trivial (e.g., Cornell Box), then the difference is negligible. Trace some spheres in your fragment shader all day long. Check http://shadertoy.com/ to witness the madness that can be achieved with modern GPUs and fragment shaders.

But. If your scene and shading are quite complex, you need to control how work is done. Rendering a quad and doing the tracing in a frag shader is going to, at best, make your application hang while the driver cries, changes its legal name, and moves to the other side of the world...and at worst, crash the driver. Many drivers will abort if a single operation takes too long (which virtually never happens under standard usage, but will happen awfully quickly when you start trying to trace 1M poly scenes).

So you're doing too much work in the frag shader...next logical though? Ok, limit the workload. Draw smaller quads to control how much of the screen you're tracing at once. Or use glScissor. Make the workload smaller and smaller until your driver can handle it.

Guess what we've just re-invented? Compute shader work groups! Work groups are compute shader's mechanism for controlling job size, and they're a far better abstraction for doing so than fragment-level hackery (when we're dealing with this kind of complex task). Now we can very naturally control how many rays we dispatch, and we can do so without being tightly-coupled to screen-space. For a simple tracer, that adds unnecessary complexity. For a 'real' one, it means that we can easily do sub-pixel raycasting on a jittered grid for AA, huge numbers of raycasts per pixel for pathtracing if we so desire, etc.

Other features of compute shaders that are useful for performant, industrial-strength tracers:

Shared Memory between thread groups (allows, for example, packet tracing, wherein an entire packet of spatially-coherent rays are traced at the same time to exploit memory coherence & the ability to communicate with nearby rays)
Scatter Writes allow compute shaders to write to arbitrary image locations (note: image and texture are different in subtle ways, but the advantage remains relevant); you no longer have to trace directly from a known pixel location

In general, the architecture of modern GPUs are designed to support this kind of task more naturally using compute. Personally, I have written a real-time progressive path tracer using MLT, kd-tree acceleration, and a number of other computationally-expensive techniques (PT is already extremely expensive). I tried to remain in a fragment shader / full-screen quad as long as I could. Once my scene was complex enough to require an acceleration structure, my driver started choking no matter what hackery I pulled. I re-implemented in CUDA (not quite the same as compute, but leveraging the same fundamental GPU architectural advances), and all was well with the world.

If you really want to dig in, have a glance at section 3.1 here: https://graphics.cg.uni-saarland.de/fileadmin/cguds/papers/2007/guenther_07_BVHonGPU/Guenter_et_al._-_Realtime_Ray_Tracing_on_GPU_with_BVH-based_Packet_Traversal.pdf. Frankly the best answer to this question would be an extensive discussion of GPU micro-architecture, and I'm not at all qualified to give that. Looking at modern GPU tracing papers like the one above will give you a sense of how deep the performance considerations go.

One last note: any performance advantage of compute over frag in the context of raytracing a complex scene has absolutely nothing to do with rasterization / vertex shader overhead / blending operation overhead, etc. For a complex scene with complex shading, bottlenecks are entirely in the tracing computations, which, as discussed, compute shaders have tools for implementing more efficiently.

182

answered Oct 01 '22 04:10

Josh Parnell

Related questions
                            
                                difference between SSBO and Image load/store
                            
                                How to apply Bullet physics to drawn Opengl 3d shapes
                            
                                Jogl Shader programming
                            
                                C++ OpenGL application as a web service
                            
                                OpenGL translucent texture over other texture
                            
                                where is GL_MULTISAMPLE defined?
                            
                                OpenGL structure of VAO/VBO for model with moving parts?
                            
                                Modern OpenGL: VBO, GLM and Matrix Stacks
                            
                                Eclipse marks OpenGL functions cannot be resolved
                            
                                Low polygon cone - smooth shading at the tip
                            
                                Noticable lag in a simple OpenGL program with mouse input through GLFW
                            
                                Shader for blurred edges, or how to draw light rays
                            
                                Qt 5.5 with qmake: Linker cannot resolve OpenGL function calls
                            
                                OpenGL - Different clear color for individual color attachments?
                            
                                Howto: install JOGL for Eclipse in Mac OSX 10.6?
                            
                                What OpenGL version to choose for cross-platform desktop application
                            
                                glGenBuffers - freeing memory?
                            
                                How do I make this simple OpenGL code (works in a "lenient" 3.3 and 4.2 profile) work in a strict 3.2 and 4.2 core profile?
                            
                                Modern equivalent of `gluOrtho2d `
                            
                                High CPU usage when GLFW / OpenGL Window isn't visible

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ray tracing via Compute Shader vs Screen Quad

Tags:

graphics

opengl

fragment-shader

raytracing

compute-shader

Simon Müller

People also ask

1 Answers

Josh Parnell

Recent Activity

Donate For Us