Metal Compute pipeline not working on MacOS, but working on iOS

Question

I'm trying to do some GPGPU computations with Metal. I have a basic Metal pipeline that:

creates the required MTLComputePipelineState pipeline and all the associated objects (MTLComputeCommandEncoder, command queue and so on);
creates a target texture for writing (using desc.usage = MTLTextureUsageShaderWrite;);
launches a basic shader to fill this texture with some values (in my experiments, either setting one of the color component to 1 or creating a grey-valued gradient based on the thread coordinates);
reads back the content of this texture from the GPU.

I'm testing this code in 2 setups:

on OSX 10.11 with a MacBook Pro early 2013;
on iOS 9 with an iPhone 6.

The iOS version runs just fine and I get exactly what I ask the shader to do. On OSX however I get a valid (non-nil, with correct size) output texture. However, when fetching the data back all I get is 0 everywhere.

Am I missing a step that would be specific to the OS X implementation? This seems to happen with both the NVIDIA GT650M and the Intel HD4000, or possibly a bug in the runtime?

As I have currently no idea on how to investigate further the issue, any help in this regard would also be greatly appreciated :-)

EDIT - My current implementation

This is the initial (failing) state of my implementation. It's a bit long but mostly boilerplate code to create the pipeline:

id<MTLDevice> device = MTLCreateSystemDefaultDevice();
id<MTLLibrary> library = [device newDefaultLibrary];
id<MTLCommandQueue> commandQueue = [device newCommandQueue];

NSError *error = nil;
id<MTLComputePipelineState> pipeline = [device newComputePipelineStateWithFunction:[library
                                                                                    newFunctionWithName:@"dummy"]
                                                                             error:&error];
if (error)
{
    NSLog(@"%@", [error localizedDescription]);
}
MTLTextureDescriptor *desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
                                                                                width:16
                                                                               height:1
                                                                            mipmapped:NO];
desc.usage = MTLTextureUsageShaderWrite;

id<MTLTexture> texture = [device newTextureWithDescriptor:desc];

MTLSize threadGroupCounts = MTLSizeMake(8, 1, 1);
MTLSize threadGroups = MTLSizeMake([texture width]  / threadGroupCounts.width,
                                   [texture height] / threadGroupCounts.height,
                                   1);

id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];

id<MTLComputeCommandEncoder> commandEncoder = [commandBuffer computeCommandEncoder];
[commandEncoder setComputePipelineState:pipeline];

[commandEncoder setTexture:texture atIndex:0];
[commandEncoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadGroupCounts];
[commandEncoder endEncoding];

[commandBuffer commit];
[commandBuffer waitUntilCompleted];

The code used to get the data is the following (I've split the file in two parts to get smaller code chunks):

// Get the data back
uint8_t* imageBytes = malloc([texture width] * [texture height] * 4);
assert(imageBytes);
MTLRegion region = MTLRegionMake2D(0, 0, [texture width], [texture height]);
[texture getBytes:imageBytes bytesPerRow:[texture width]*4 fromRegion:region mipmapLevel:0];
for (int i = 0; i < 16; ++i)
{
    NSLog(@"Pix = %d %d %d %d",
          *((uint8_t*)imageBytes + 4 * i),
          *((uint8_t*)imageBytes + 4 * i + 1),
          *((uint8_t*)imageBytes + 4 * i + 2),
          *((uint8_t*)imageBytes + 4 * i + 3));
}

And this is the shader code (writing 1 to red and alpha, should become 0xff in the output buffer when read on the host):

#include <metal_stdlib>

using namespace metal;

kernel void dummy(texture2d<float, access::write> outTexture [[ texture(0) ]],
                  uint2 gid [[ thread_position_in_grid ]])
{
    outTexture.write(float4(1.0, 0.0, 0.0, 1.0), gid);
}

Denn Nevera · Accepted Answer

I suppose you didn't call - synchronizeTexture:slice:level: May be the follow example (part of jpeg-turbo writer class implementation) can solve your issue:

    row_stride = (int)cinfo.image_width  * cinfo.input_components; /* JSAMPLEs per row in image_buffer */

uint   counts        = cinfo.image_width * 4;
uint   componentSize = sizeof(uint8);
uint8 *tmp = NULL;
if (texture.pixelFormat == MTLPixelFormatRGBA16Unorm) {
    tmp  = malloc(row_stride);
    row_stride *= 2;
    componentSize = sizeof(uint16);
}

//
// Synchronize texture with host memory 
//
id<MTLCommandQueue> queue             = [texture.device newCommandQueue];
id<MTLCommandBuffer> commandBuffer    = [queue commandBuffer];
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];

[blitEncoder synchronizeTexture:texture slice:0 level:0];
[blitEncoder endEncoding];

[commandBuffer commit];
[commandBuffer waitUntilCompleted];

void       *image_buffer  = malloc(row_stride);

int j=0;
while (cinfo.next_scanline < cinfo.image_height) {

    MTLRegion region = MTLRegionMake2D(0, cinfo.next_scanline, cinfo.image_width, 1);

    [texture getBytes:image_buffer
                   bytesPerRow:cinfo.image_width * 4 * componentSize
                    fromRegion:region
                   mipmapLevel:0];

    if (texture.pixelFormat == MTLPixelFormatRGBA16Unorm) {
        uint16 *s = image_buffer;
        for (int i=0; i<counts; i++) {
            tmp[i] = (s[i]>>8) & 0xff;
            j++;
        }
        row_pointer[0] = tmp;
    }
    else{
        row_pointer[0] = image_buffer;
    }
    (void) jpeg_write_scanlines(&cinfo, row_pointer, 1);
}

free(image_buffer);
if (tmp != NULL) free(tmp);

It was tested on mid 2012 mac book pro with NVIDIA GeForce GT 650M 1024 МБ.

Discussion on Apple developer forums.

Metal Compute pipeline not working on MacOS, but working on iOS

Tags:

macos

ios

gpgpu

metal

sansuiso

1 Answers

Denn Nevera

Recent Activity

Donate For Us

Metal Compute pipeline not working on MacOS, but working on iOS

Tags:

macos

ios

gpgpu

metal

sansuiso

1 Answers

Denn Nevera

Related questions

Recent Activity

Donate For Us