Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the GLM math library compatible with Apple's metal shading language?

I'm about to port an iOS app that utilizes OpenGL written in C++ to Apple's Metal. The goal is to completely get rid of OpenGL and replace it with Metal.

The OpenGL code is layered and I'm attempting to just replace the renderer, i.e. the class that actually calls OpenGL functions. However, the entire code base utilizes the GLM math library to represent vectors and matrices.

For example there is a camera class that provides the view and projection matrix. Both of them are of type glm::mat4 and are simply passed to the GLSL vertex shader where they are compatible with the mat4 data type given by GLSL. I would like to utilize that camera class as it is to send those matrices to the Metal vertex shader. Now, I'm not sure whether glm::mat4 is compatible with Metal's float4x4.

I don't have a working example where I can test this because I literally just started with Metal and can't find anything useful online.

So my questions are as follows:

  1. Are GLM types such as glm::mat4 and glm::vec4 compatible with Metal's float4x4 / float4?
  2. If the answer to question 1. is yes, am I having any disadvantages if I directly use GLM types in Metal shaders?

The background regarding question 2. is that I came across Apple's SIMD library that provides another set of data types which I would not be able to use in such a case, right?

The app is iOS only, I don't care about running Metal on macOS at all.

Code snippets (preferably Objective-C (yes, no joke)) would be very welcome.

like image 427
ackh Avatar asked Dec 23 '22 00:12

ackh


1 Answers

Overall, the answer is yes, GLM is a good fit for apps that utilize Apple's Metal. However, there are a couple of things that need to be considered. Some of those things have already been hinted at in the comments.

First of all, the Metal Programming Guide mentions that

Metal defines its Normalized Device Coordinate (NDC) system as a 2x2x1 cube with its center at (0, 0, 0.5)

This means that Metal NDC coordinates are different from OpenGL NDC coordinates because OpenGL defines the NDC coordinate system as a 2x2x2 cube with its center at (0, 0, 0), i.e. valid OpenGL NDC coordinates must be within

// Valid OpenGL NDC coordinates
-1 <= x <= 1
-1 <= y <= 1
-1 <= z <= 1

Because GLM was originally tailored for OpenGL, its glm::ortho and glm::perspective functions create projection matrices that transform coordinates into OpenGL NDC coordinates. Because of this, it is necessary to adjust those coordinates to Metal. How this could be achieved is outlined in this blog post.

However, there is a more elegant way to fix those coordinates. Interestingly, Vulkan utilizes the same NDC coordinate system as Metal and GLM has already been adapted to work with Vulkan (hint for this found here).

By defining the C/C++ preprocessor macro GLM_FORCE_DEPTH_ZERO_TO_ONE the mentioned GLM projection matrix functions will transform coordinates to work with Metal's / Vulkan's NDC coordinate system. That #define will hence solve the problem with the different NDC coordinate systems.

Next, it is important to take both the size and alignment of GLM's and Metal's data types into account when exchanging data between Metal shaders and client side (CPU) code. Apple's Metal Shading Language Specification lists both size and alignment for some of its data type.

For the data types that aren't listed in there, size and alignment can be determined by utilizing C/C++'s sizeof and alignof operators. Interestingly, both operators are supported within Metal shaders. Here are a couple of examples for both GLM and Metal:

// Size and alignment of some GLM example data types
glm::vec2 : size:  8, alignment: 4
glm::vec3 : size: 12, alignment: 4
glm::vec4 : size: 16, alignment: 4
glm::mat4 : size: 64, alignment: 4

// Size and alignment of some of Metal example data types
float2        : size:  8, alignment:  8
float3        : size: 16, alignment: 16
float4        : size: 16, alignment: 16
float4x4      : size: 64, alignment: 16
packed_float2 : size:  8, alignment:  4
packed_float3 : size: 12, alignment:  4
packed_float4 : size: 16, alignment:  4

As can be seen from the above table the GLM vector data types match nicely with Metal's packed vector data types both in terms of size and alignment. Note however, that the 4x4 matrix data types don't match in terms of alignment.

According to this answer to another SO question, alignment means the following:

Alignment is a restriction on which memory positions a value's first byte can be stored. (It is needed to improve performance on processors and to permit use of certain instructions that works only on data with particular alignment, for example SSE need to be aligned to 16 bytes, while AVX to 32 bytes.)

Alignment of 16 means that memory addresses that are a multiple of 16 are the only valid addresses.

Therefore we need to be careful to factor in the different alignments when sending 4x4 matrices to Metal shaders. Let's look at an example:

The following Objective-C struct serves as a buffer to store uniform values to be sent to a Metal vertex shader:

typedef struct
{
  glm::mat4 modelViewProjectionMatrix;
  glm::vec2 windowScale;
  glm::vec4 edgeColor;
  glm::vec4 selectionColor;
} SolidWireframeUniforms;

This struct is defined in a header file that is included wherever it is required in client side (i.e. CPU side) code. To be able to utilize those values on the Metal vertex shader side we need a corresponding data structure. In case of this example the Metal vertex shader part looks as follows:

#include <metal_matrix>
#include <metal_stdlib>

using namespace metal;
    
struct SolidWireframeUniforms
{
  float4x4      modelViewProjectionMatrix;
  packed_float2 windowScale;
  packed_float4 edgeColor;
  packed_float4 selectionColor;
};

// VertexShaderInput struct defined here...

// VertexShaderOutput struct defined here...

vertex VertexShaderOutput solidWireframeVertexShader(VertexShaderInput input [[stage_in]], constant SolidWireframeUniforms &uniforms [[buffer(1)]])
{
  VertexShaderOutput output;
  // vertex shader code
}

To transmit data from the client side code to the Metal shader the uniform struct is packaged into a buffer. The below code shows how to create and update that buffer:

- (void)createUniformBuffer
{
  _uniformBuffer = [self.device newBufferWithBytes:(void*)&_uniformData length:sizeof(SolidWireframeUniforms) options:MTLResourceCPUCacheModeDefaultCache];
}


- (void)updateUniforms
{
  dispatch_semaphore_wait(_bufferAccessSemaphore, DISPATCH_TIME_FOREVER);

  SolidWireframeUniforms* uniformBufferContent = (SolidWireframeUniforms*)[_uniformBuffer contents];
  memcpy(uniformBufferContent, &_uniformData, sizeof(SolidWireframeUniforms));

  dispatch_semaphore_signal(_bufferAccessSemaphore);
}

Note the memcpy call that is used to update the buffer. This is where things can go wrong if the size and alignment of the GLM and Metal data types don't match. Since we simply copy every byte of the Objective-C struct to the buffer and then on Metal shader side, interpret that data again, the data will get misinterpreted on the Metal shader side if the data structures don't match.

In the case of that example, the memory layout looks as follows:

                                              104 bytes
           |<--------------------------------------------------------------------------->|
           |                                                                             |
           |         64 bytes              8 bytes         16 bytes         16 bytes     |
           | modelViewProjectionMatrix   windowScale      edgeColor      selectionColor  |
           |<------------------------->|<----------->|<--------------->|<--------------->|
           |                           |             |                 |                 |
           +--+--+--+------------+--+--+--+-------+--+--+-----------+--+--+----------+---+
Byte index | 0| 1| 2|    ...     |62|63|64|  ...  |71|72|    ...    |87|88|   ...    |103|
           +--+--+--+------------+--+--+--+-------+--+--+-----------+--+--+----------+---+
                                        ^             ^                 ^
                                        |             |                 |
                                        |             |                 +-- Is a multiple of 4, aligns with glm::vec4 / packed_float4
                                        |             |
                                        |             +-- Is a multiple of 4, aligns with glm::vec4 / packed_float4
                                        |
                                        +-- Is a multiple of 4, aligns with glm::vec2 / packed_float2

With the exception of the 4x4 matix alignment, everything matches well. The misalignment of the 4x4 matrix poses no problem here as visible in the above memory layout. However, if the uniform struct gets modified, alignment or size could become a problem and padding might be necessary in order for it to work properly.

Lastly, there is something else to be aware of. The alignment of the data types has an impact on the size that needs to be allocated for the uniform buffer. Because the largest alignment that occurs in the SolidWireframeUniforms struct is 16, it seems that the length of the uniform buffer must also be a multiple of 16.

This is not the case in the above example, where the buffer length is 104 bytes which is not a multiple of 16. When running the app directly from Xcode, a built-in assertion prints the following message:

validateFunctionArguments:3478: failed assertion `Vertex Function(solidWireframeVertexShader): argument uniforms[0] from buffer(1) with offset(0) and length(104) has space for 104 bytes, but argument has a length(112).'

In order to resolve this, we need to make the size of the buffer a multiple of 16 bytes. To do so we just calculate the next multiple of 16 based on the actual length we need. For 104 that would be 112, which is what the assertion above also tells us.

The following function calculates the next multiple of 16 for a specified integer:

- (NSUInteger)roundUpToNextMultipleOf16:(NSUInteger)number
{
  NSUInteger remainder = number % 16;

  if(remainder == 0)
  {
    return number;
  }

  return number + 16 - remainder;
}

Now we calculate the length of the uniform buffer using the above function which changes the buffer creation method (posted above) as follows:

- (void)createUniformBuffer
{
  NSUInteger bufferLength = [self roundUpToNextMultipleOf16:sizeof(SolidWireframeUniforms)];
  _uniformBuffer = [self.device newBufferWithBytes:(void*)&_uniformData length:bufferLength options:MTLResourceCPUCacheModeDefaultCache];
}

That should resolve issue detected by the mentioned assertion.

like image 186
ackh Avatar answered Apr 28 '23 21:04

ackh