Anyone knows a proper way to calculate mean value of the buffer with random float numbers in the metal kernel?
Dispatching work on the compute command encoder:
threadsPerGroup = MTLSizeMake(1, 1, inputTexture.arrayLength);
numThreadGroups = MTLSizeMake(1, 1, inputTexture.arrayLength / threadsPerGroup.depth);
[commandEncoder dispatchThreadgroups:numThreadGroups
threadsPerThreadgroup:threadsPerGroup];
Kernel code:
kernel void mean(texture2d_array<float, access::read> inTex [[ texture(0) ]],
device float *means [[ buffer(1) ]],
uint3 id [[ thread_position_in_grid ]]) {
if (id.x == 0 && id.y == 0) {
float mean = 0.0;
for (uint i = 0; i < inTex.get_width(); ++i) {
for (uint j = 0; j < inTex.get_height(); ++j) {
mean += inTex.read(uint2(i, j), id.z)[0];
}
}
float textureArea = inTex.get_width() * inTex.get_height();
mean /= textureArea;
out[id.z] = mean;
}
}
The buffer is represented in the texture of texture2d_array type with R32Float pixel format.
If you can use an array of uint (instead of float) as your data source, I would suggest using an "Atomic Fetch and Modify functions" (as described in the metal shading language spec) to write atomically to a buffer.
Here's an example of a kernel function which takes an input buffer (data: an array of Float) and writes the sum of the buffer into an atomic buffer (sum, a pointer to a uint):
kernel void sum(device uint *data [[ buffer(0) ]],
volatile device atomic_uint *sum [[ buffer(1) ]],
uint gid [[ thread_position_in_grid ]])
{
atomic_fetch_add_explicit(sum, data[gid], memory_order_relaxed);
}
In your swift file, you would set the buffers:
...
let data: [UInt] = [1, 2, 3, 4]
let dataBuffer = device.makeBuffer(bytes: &data, length: (data.count * MemoryLayout<UInt>.size), options: [])
commandEncoder.setBuffer(dataBuffer, offset: 0, at: 0)
var sum:UInt = 0
let sumBuffer = device!.makeBuffer(bytes: &sum, length: MemoryLayout<UInt>.size, options: [])
commandEncoder.setBuffer(sumBuffer, offset: 0, at: 1)
commandEncoder.endEncoding()
Commit, wait and then fetch the data from the GPU:
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
let nsData = NSData(bytesNoCopy: sumBuffer.contents(),
length: sumBuffer.length,
freeWhenDone: false)
nsData.getBytes(&sum, length:sumBuffer.length)
let mean = Float(sum/data.count)
print(mean)
Alternatively, if your initial data source has to be an array of float, you could use the vDSP_meanv method of the Accelerate framework which is very fast for such computation.
I Hope that helped, cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With