Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Metal emulate geometry shaders using compute shaders

I'm trying to implement voxel cone tracing in Metal. One of the steps in the algorithm is to voxelize the geometry using a geometry shader. Metal does not have geometry shaders so I was looking into emulating them using a compute shader. I pass in my vertex buffer into the compute shader, do what a geometry shader would normally do, and write the result to an output buffer. I also add a draw command to an indirect buffer. I use the output buffer as the vertex buffer for my vertex shader. This works fine, but I need twice as much memory for my vertices, one for the vertex buffer and one for the output buffer. Is there any way to directly pass the output of the compute shader to the vertex shader without storing it in an intermediate buffer? I don't need to save the contents of the output buffer of the compute shader. I just need to give the results to the vertex shader.

Is this possible? Thanks

EDIT

Essentially, I'm trying to emulate the following shader from glsl:

#version 450

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;

layout(location = 0) in vec3 in_position[];
layout(location = 1) in vec3 in_normal[];
layout(location = 2) in vec2 in_uv[];

layout(location = 0) out vec3 out_position;
layout(location = 1) out vec3 out_normal;
layout(location = 2) out vec2 out_uv;

void main()
{
    vec3 p = abs(cross(in_position[1] - in_position[0], in_position[2] - in_position[0]));

    for (uint i = 0; i < 3; ++i)
    {
        out_position = in_position[i];
        out_normal = in_normal[i];
        out_uv = in_uv[i];

        if (p.z > p.x && p.z > p.y)
        {
            gl_Position = vec4(out_position.x, out_position.y, 0, 1);
        }
        else if (p.x > p.y && p.x > p.z)
        {
            gl_Position = vec4(out_position.y, out_position.z, 0, 1);
        }
        else
        {
            gl_Position = vec4(out_position.x, out_position.z, 0, 1);
        }

        EmitVertex();
    }

    EndPrimitive();
}

For each triangle, I need to output a triangle with vertices at these new positions instead. The triangle vertices come from a vertex buffer and is drawn using an index buffer. I also plan on adding code that will do conservative rasterization (just increase the size of the triangle by a little bit) but it's not shown here. Currently what I'm doing in the Metal compute shader is using the index buffer to get the vertex, do the same code in the geometry shader above, and outputting the new vertex in another buffer which I then use to draw.

like image 214
theonewhoknocks Avatar asked May 27 '18 22:05

theonewhoknocks


People also ask

Does metal support compute shaders?

Platform-specific differences. Metal (for iOS and tvOS platforms) does not support atomic operations on Textures. Metal also does not support GetDimensions queries on buffers. Pass the buffer size info as constant to the shader if needed.

What are compute shaders good for?

A Compute Shader is a Shader Stage that is used entirely for computing arbitrary information. While it can do rendering, it is generally used for tasks not directly related to drawing triangles and pixels.

Are compute shaders faster than fragment shaders?

This turns out to be 50% faster than the fragment shader!

Does OpenGL support compute shaders?

Compute Shader Stage. To make GPU computing easier accessible especially for graphics applications while sharing common memory mappings, the OpenGL standard introduced the compute shader in OpenGL version 4.3 as a shader stage for computing arbitrary information.


1 Answers

Here's a very speculative possibility depending on exactly what your geometry shader needs to do.

I'm thinking you can do it sort of "backwards" with just a vertex shader and no separate compute shader, at the cost of redundant work on the GPU. You would do a draw as if you had a buffer of all of the output vertices of the output primitives of the geometry shader. You would not actually have that on hand, though. You would construct a vertex shader that would calculate them in flight.

So, in the app code, calculate the number of output primitives and therefore the number of output vertices that would be produced for a given count of input primitives. Do a draw of the output primitive type with that many vertices.

You would not provide a buffer with the output vertex data as input to this draw.

You would provide the original index buffer and original vertex buffer as inputs to the vertex shader for that draw. The shader would calculate from the vertex ID which output primitive it's for, and which vertex of that primitive (e.g. for a triangle, vid / 3 and vid % 3, respectively). From the output primitive ID, it would calculate which input primitive would have generated it in the original geometry shader.

The shader would look up the indices for that input primitive from the index buffer and then the vertex data from the vertex buffer. (This would be sensitive to the distinction between a triangle list vs. triangle strip, for example.) It would apply any pre-geometry-shader vertex shading to that data. Then it would do the part of the geometry computation that contributes to the identified vertex of the identified output primitive. Once it has calculated the output vertex data, you can apply any post-geometry-shader vertex shading(?) that you want. The result is what it would return.

If the geometry shader can produce a variable number of output primitives per input primitive, well, at least you have a maximum number. So, you can draw the maximum potential count of vertices for the maximum potential count of output primitives. The vertex shader can do the computations necessary to figure out if the geometry shader would have, in fact, produced that primitive. If not, the vertex shader can arrange for the whole primitive to be clipped away, either by positioning it outside of the frustum or using a [[clip_distance]] property of the output vertex data.

This avoids ever storing the generated primitives in a buffer. However, it causes the GPU to do some of the pre-geometry-shader vertex shader and geometry shader calculations repeatedly. It will be parallelized, of course, but may still be slower than what you're doing now. Also, it may defeat some optimizations around fetching indices and vertex data that may be possible with more normal vertex shaders.


Here's an example conversion of your geometry shader:

#include <metal_stdlib>
using namespace metal;

struct VertexIn {
    // maybe need packed types here depending on your vertex buffer layout
    // can't use [[attribute(n)]] for these because Metal isn't doing the vertex lookup for us
    float3 position;
    float3 normal;
    float2 uv;
};

struct VertexOut {
    float3 position;
    float3 normal;
    float2 uv;
    float4 new_position [[position]];
};


vertex VertexOut foo(uint vid [[vertex_id]],
                     device const uint *indexes [[buffer(0)]],
                     device const VertexIn *vertexes [[buffer(1)]])
{
    VertexOut out;

    const uint triangle_id = vid / 3;
    const uint vertex_of_triangle = vid % 3;

    // indexes is for a triangle strip even though this shader is invoked for a triangle list.
    const uint index[3] = { indexes[triangle_id], index[triangle_id + 1], index[triangle_id + 2] };
    const VertexIn v[3] = { vertexes[index[0]], vertexes[index[1]], vertexes[index[2]] };

    float3 p = abs(cross(v[1].position - v[0].position, v[2].position - v[0].position));

    out.position = v[vertex_of_triangle].position;
    out.normal = v[vertex_of_triangle].normal;
    out.uv = v[vertex_of_triangle].uv;

    if (p.z > p.x && p.z > p.y)
    {
        out.new_position = float4(out.position.x, out.position.y, 0, 1);
    }
    else if (p.x > p.y && p.x > p.z)
    {
        out.new_position = float4(out.position.y, out.position.z, 0, 1);
    }
    else
    {
        out.new_position = float4(out.position.x, out.position.z, 0, 1);
    }

    return out;
}
like image 171
Ken Thomases Avatar answered Oct 01 '22 08:10

Ken Thomases