As a follow-up question to this answer. I am trying to replace a for-loop running on CPU with a kernel function in Metal to parallelize computation and speed up performance.
My function is basically a convolution. Since I repeatedly receive new data for my input array values (the data stems from a AVCaptureSession
) it seems that using newBufferWithBytesNoCopy:length:options:deallocator:
is the sensible option for creating the MTLBuffer
objects. Here is the relevant code:
id <MTLBuffer> dataBuffer = [device newBufferWithBytesNoCopy:dataVector length:sizeof(dataVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> filterBuffer = [device newBufferWithBytesNoCopy:filterVector length:sizeof(filterVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> outBuffer = [device newBufferWithBytesNoCopy:outVector length:sizeof(outVector) options:MTLResourceStorageModeShared deallocator:nil];
When running this I get the following error:
failed assertion `newBufferWithBytesNoCopy:pointer 0x16fd0bd48 is not 4096 byte aligned.'
Right now, I am not allocating any memory, but (for testing purposes) just creating an empty array of floats of a fixed size and filling it up with random numbers. So my main question is:
How do I allocate these arrays of floats the correct way so that the following requirement is met
This value must result in a page-aligned region of memory.
Also, some additional questions:
MTLBuffer
with the newBufferWithBytesNoCopy
method, or is copying the data not really an issue in terms of performance? (My actual data will consist of approximately 43'000 float values per video frame.)MTLResourceStorageModeShared
the correct choice for MTLResourceOptions
The API reference says
The storage allocation of the returned new MTLBuffer object is the same as the pointer input value. The existing memory allocation must be covered by a single VM region, typically allocated with vm_allocate or mmap. Memory allocated by malloc is specifically disallowed.
Does this apply only to the output buffer, or should the storage allocation for all objects used with MTLBuffer
not be done with malloc
?
The easiest way to allocate page-aligned memory is with posix_memalign
. Here's a complete example of creating a buffer with page-aligned memory:
void *data = NULL;
NSUInteger pageSize = getpagesize();
NSUInteger allocationSize = /* required byte count, rounded up to next multiple of page size */ pageSize * 10;
int result = posix_memalign(&data, pageSize, allocationSize);
if (result == noErr && data) {
id<MTLBuffer> buffer = [device newBufferWithBytesNoCopy:data
length:allocationSize
options:MTLResourceStorageModeShared
deallocator:^(void *pointer, NSUInteger length)
{
free(pointer);
}];
NSLog(@"Created buffer of length %d", (int)buffer.length);
}
Since you can't ensure that your data will arrive in a page-aligned pointer, you'll probably be better off just allocating a MTLBuffer
of whatever size can accommodate your data, without using the no-copy variant. If you need to do real-time processing of the data, you should create a pool of buffers and cycle among them instead of waiting for each command buffer to complete. The Shared
storage mode is correct for these use cases. The caveat related to malloc
only applies to the no-copy case, since in every other case, Metal allocates the memory for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With