I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N
different hypothesised poses (N
will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N
renderings simultaneously.
In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS
viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS
model-pose matrices to a mat4
uniform
array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID
in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.
GL_MAX_VIEWPORTS
is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:
Is there is a workaround for the GL_MAX_VIEWPORTS
limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS)
times?
I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:
h*gl_InvocationID
to the y
coordinates of the vertices after perspective projection (where h
is the desired viewport height) and passes gl_InvocationID
onto the fragment shader; anddiscard
s fragments with y
coordinates that satisfy y<gl_InvocationID*h || y>=(gl_InvocationID+1)*h
.But I was put off investigating this idea further by the fear that branching and discard
would be very detrimental to performance.
The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.
I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.
From a cursory glance at the paper, it seems to me that the actual viewport doesn't change. That is, you're still rendering to the same width/height and X/Y positions, with the same depth range.
What you want is to change which image you're rendering to. Which is what gl_Layer
is for; to change which layer within the layered array of images attached to the framebuffer you are rendering to.
So just set the gl_ViewportIndex
to 0 for all vertices. Or more specifically, don't set it at all.
The number of GS instancing invocations does not have to be a restriction; that's your choice. GS invocations can write multiple primitives, each to a different layer. So you could have each instance write, for example, 4 primitives, each to 4 separate layers.
Your only limitations should be the number of layers you can use (governed by GL_MAX_ARRAY_TEXTURE_LAYERS
and GL_MAX_FRAMEBUFFER_LAYERS
, both of which must be at least 2048), and the number of primitives and vertex data that a single GS invocation can emit (which is kind of complicated).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With