Deferred Rendering with Tile-Based culling Concept Problems

Tags:

EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?

I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)

In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:

for each tile {
  for each light effecting the tile {
    render quad (the tile) and compute lighting
    blend with previous tiles (GL_ONE, GL_ONE)
  }
}

This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:

for each tile {
 render quad (the tile) and compute all lights
}

But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?

788

asked Apr 14 '13 00:04

Spaceman1701

1 Answers

But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .

It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.

Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.

answered Sep 19 '22 13:09

Nicol Bolas

Related questions
                            
                                Constant game speed independent of variable FPS in OpenGL with GLUT?
                            
                                Synchronizing multiple OpenGL windows to vsync
                            
                                What and why about GLSL textureGrad
                            
                                Diffuse light/shadow
                            
                                Correct usages of QOpenGLFunctions
                            
                                Custom window frame behaving differently across qt builds (ANGLE vs OpenGL)
                            
                                OpenGL sRGB framebuffer oddity
                            
                                Inner workings of Raspberry Pi userland graphics driver (not firmware or kernel part)
                            
                                OpenGL fog versus OpenGL ES fog
                            
                                Rendering rectangle texture with GLSL
                            
                                How to initialize OpenGL context with PyGame instead of GLUT
                            
                                Tangent Space Normal Mapping - shader sanity check
                            
                                Text rendering terribly slow
                            
                                GLSL for-loop array index
                            
                                OpenCL/OpenGL Interop with Multiple GPUs
                            
                                World-space position from logarithmic depth buffer
                            
                                Intensive graphics application in C# (with .NET/Mono)
                            
                                How can I output a HDMI 1.4a-compatible stereoscopic signal from an OpenGL application to a 3DTV?
                            
                                undefined reference to `GlewInit' - OpenGL
                            
                                Precise control over texture bits in GLSL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Deferred Rendering with Tile-Based culling Concept Problems

Tags:

opengl

lighting

culling

deferred-rendering

deferred-shading

Spaceman1701

People also ask

1 Answers

Nicol Bolas

Recent Activity

Donate For Us