I'm trying to do some GPGPU computations with Metal. I have a basic Metal pipeline that:
MTLComputePipelineState
pipeline and all the associated objects (MTLComputeCommandEncoder
, command queue and so on);desc.usage = MTLTextureUsageShaderWrite;
);I'm testing this code in 2 setups:
The iOS version runs just fine and I get exactly what I ask the shader to do. On OSX however I get a valid (non-nil, with correct size) output texture. However, when fetching the data back all I get is 0 everywhere.
Am I missing a step that would be specific to the OS X implementation? This seems to happen with both the NVIDIA GT650M and the Intel HD4000, or possibly a bug in the runtime?
As I have currently no idea on how to investigate further the issue, any help in this regard would also be greatly appreciated :-)
EDIT - My current implementation
This is the initial (failing) state of my implementation. It's a bit long but mostly boilerplate code to create the pipeline:
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
id<MTLLibrary> library = [device newDefaultLibrary];
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
NSError *error = nil;
id<MTLComputePipelineState> pipeline = [device newComputePipelineStateWithFunction:[library
newFunctionWithName:@"dummy"]
error:&error];
if (error)
{
NSLog(@"%@", [error localizedDescription]);
}
MTLTextureDescriptor *desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
width:16
height:1
mipmapped:NO];
desc.usage = MTLTextureUsageShaderWrite;
id<MTLTexture> texture = [device newTextureWithDescriptor:desc];
MTLSize threadGroupCounts = MTLSizeMake(8, 1, 1);
MTLSize threadGroups = MTLSizeMake([texture width] / threadGroupCounts.width,
[texture height] / threadGroupCounts.height,
1);
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
id<MTLComputeCommandEncoder> commandEncoder = [commandBuffer computeCommandEncoder];
[commandEncoder setComputePipelineState:pipeline];
[commandEncoder setTexture:texture atIndex:0];
[commandEncoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadGroupCounts];
[commandEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
The code used to get the data is the following (I've split the file in two parts to get smaller code chunks):
// Get the data back
uint8_t* imageBytes = malloc([texture width] * [texture height] * 4);
assert(imageBytes);
MTLRegion region = MTLRegionMake2D(0, 0, [texture width], [texture height]);
[texture getBytes:imageBytes bytesPerRow:[texture width]*4 fromRegion:region mipmapLevel:0];
for (int i = 0; i < 16; ++i)
{
NSLog(@"Pix = %d %d %d %d",
*((uint8_t*)imageBytes + 4 * i),
*((uint8_t*)imageBytes + 4 * i + 1),
*((uint8_t*)imageBytes + 4 * i + 2),
*((uint8_t*)imageBytes + 4 * i + 3));
}
And this is the shader code (writing 1 to red and alpha, should become 0xff in the output buffer when read on the host):
#include <metal_stdlib>
using namespace metal;
kernel void dummy(texture2d<float, access::write> outTexture [[ texture(0) ]],
uint2 gid [[ thread_position_in_grid ]])
{
outTexture.write(float4(1.0, 0.0, 0.0, 1.0), gid);
}
I suppose you didn't call - synchronizeTexture:slice:level: May be the follow example (part of jpeg-turbo writer class implementation) can solve your issue:
row_stride = (int)cinfo.image_width * cinfo.input_components; /* JSAMPLEs per row in image_buffer */
uint counts = cinfo.image_width * 4;
uint componentSize = sizeof(uint8);
uint8 *tmp = NULL;
if (texture.pixelFormat == MTLPixelFormatRGBA16Unorm) {
tmp = malloc(row_stride);
row_stride *= 2;
componentSize = sizeof(uint16);
}
//
// Synchronize texture with host memory
//
id<MTLCommandQueue> queue = [texture.device newCommandQueue];
id<MTLCommandBuffer> commandBuffer = [queue commandBuffer];
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
[blitEncoder synchronizeTexture:texture slice:0 level:0];
[blitEncoder endEncoding];
[commandBuffer commit];
[commandBuffer waitUntilCompleted];
void *image_buffer = malloc(row_stride);
int j=0;
while (cinfo.next_scanline < cinfo.image_height) {
MTLRegion region = MTLRegionMake2D(0, cinfo.next_scanline, cinfo.image_width, 1);
[texture getBytes:image_buffer
bytesPerRow:cinfo.image_width * 4 * componentSize
fromRegion:region
mipmapLevel:0];
if (texture.pixelFormat == MTLPixelFormatRGBA16Unorm) {
uint16 *s = image_buffer;
for (int i=0; i<counts; i++) {
tmp[i] = (s[i]>>8) & 0xff;
j++;
}
row_pointer[0] = tmp;
}
else{
row_pointer[0] = image_buffer;
}
(void) jpeg_write_scanlines(&cinfo, row_pointer, 1);
}
free(image_buffer);
if (tmp != NULL) free(tmp);
It was tested on mid 2012 mac book pro with NVIDIA GeForce GT 650M 1024 МБ.
Discussion on Apple developer forums.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With