Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ios metal: multiple kernel calls in one command buffer

Tags:

ios

swift

gpu

metal

I'm having a problem with the implementation of multiple kernel functions in Metal in combination with Swift.

My target is to implement a block-wise DCT transformation over an image. The DCT is implemented with two matrix multiplications.

J = H * I * H^-1

The following code shows the kernel functions itself and the used calls in the swift code. If I run each kernel function alone it works but i can't manage to hand over the write buffer from the first kernel function to the second function. The second function therefore always returns a buffer filled with just 0.

All the image input and output buffers are 400x400 big with RGB (16-bit Integer for each component). The matrices are 8x8 16-bit Integers.

Is there a special command needed to synchronize the buffer read and write accesses of the different kernel functions? Or am I doing something else wrong?

Thanks for your help

shaders.metal

struct Image3D16{
    short data[400][400][3];
};

struct Matrix{
    short data[8 * 8];
};

kernel void dct1(device Image3D16 *inputImage [[buffer(0)]],
            device Image3D16 *outputImage [[buffer(1)]],
            device Matrix *mult [[buffer(2)]],
            uint2 gid [[thread_position_in_grid]],
            uint2 tid [[thread_position_in_threadgroup]]){

    int red = 0, green = 0, blue = 0;

    for(int x=0;x<8;x++){
        short r = inputImage->data[gid.x-tid.x + x][gid.y][0];
        short g = inputImage->data[gid.x-tid.x + x][gid.y][1];
        short b = inputImage->data[gid.x-tid.x + x][gid.y][2];

        red += r * mult->data[tid.x*8 + x];
        green += g * mult->data[tid.x*8 + x];
        blue += b * mult->data[tid.x*8 + x];
    }

    outputImage->data[gid.x][gid.y][0] = red;
    outputImage->data[gid.x][gid.y][1] = green;
    outputImage->data[gid.x][gid.y][2] = blue;
}

kernel void dct2(device Image3D16 *inputImage [[buffer(0)]],
             device Image3D16 *outputImage [[buffer(1)]],
             device Matrix *mult [[buffer(2)]],
             uint2 gid [[thread_position_in_grid]],
             uint2 tid [[thread_position_in_threadgroup]]){

    int red = 0, green = 0, blue = 0;
    for(int y=0;y<8;y++){
        short r = inputImage->data[gid.x][gid.y-tid.y + y][0];
        short g = inputImage->data[gid.x][gid.y-tid.y + y][1];
        short b = inputImage->data[gid.x][gid.y-tid.y + y][2];

        red += r * mult->data[tid.y*8 + y];
        green += g * mult->data[tid.y*8 + y];
        blue += b * mult->data[tid.y*8 + y];
    }

    outputImage->data[gid.x][gid.y][0] = red;
    outputImage->data[gid.x][gid.y][1] = green;
    outputImage->data[gid.x][gid.y][2] = blue;
}

ViewController.swift

...

let commandBuffer = commandQueue.commandBuffer()
let computeEncoder1 = commandBuffer.computeCommandEncoder()
computeEncoder1.setComputePipelineState(computeDCT1)
computeEncoder1.setBuffer(input, offset: 0, atIndex: 0)
computeEncoder1.setBuffer(tmpBuffer3D1, offset: 0, atIndex: 1)
computeEncoder1.setBuffer(dctMatrix1, offset: 0, atIndex: 2)
computeEncoder1.dispatchThreadgroups(blocks, threadsPerThreadgroup: dctSize)
computeEncoder1.endEncoding()

let computeEncoder2 = commandBuffer.computeCommandEncoder()
computeEncoder2.setComputePipelineState(computeDCT2)
computeEncoder2.setBuffer(tmpBuffer3D1, offset: 0, atIndex: 0)
computeEncoder2.setBuffer(output, offset: 0, atIndex: 1)
computeEncoder2.setBuffer(dctMatrix2, offset: 0, atIndex: 2)
computeEncoder2.dispatchThreadgroups(blocks, threadsPerThreadgroup: dctSize)
computeEncoder2.endEncoding()

commandBuffer.commit()
commandBuffer.waitUntilCompleted()
like image 769
Rolf Lussi Avatar asked Sep 16 '25 19:09

Rolf Lussi


1 Answers

I found the error. My kernel function tried to read outside of its allocated memory. The reaction of the metal interface is then to stop the execution of all following commands in the command buffer. Therefore was the output always zero because the computation was never done. The GPU usage of the application drops which can be used for detecting the error.

like image 109
Rolf Lussi Avatar answered Sep 19 '25 09:09

Rolf Lussi