Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

I'm working on macOS project that uses Swift and Metal for image processing on the GPU. Last week, I received my new 15-inch MacBook Pro (late 2016) and noticed something strange with my code: kernels that were supposed to write to a texture did not seem to do so...

After a lot of digging, I found that the problem is related to which GPU is used by Metal (AMD Radeon Pro 455 or Intel(R) HD Graphics 530) to do the computation.

Initializing the MTLDevice using MTLCopyAllDevices() returns an array of devices representing the Radeon and the Intel GPUs (while MTLCreateSystemDefaultDevice() returns the default device which is the Radeon). In any case, the code works as expected with the Intel GPU but that is not the case with the Radeon GPU.

Let me show you an example.

To start, here is a simple kernel that takes an input texture and copies its colour to an output texture:

    kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]],
                            texture2d<uint, access::write> outTexture [[texture(1)]],
                            uint2 gid [[thread_position_in_grid]])
    {
        uint4 out = inTexture.read(gid);
        outTexture.write(out, gid);
    }

I order to use this kernel, I use this piece of code:

    let devices = MTLCopyAllDevices()
    for device in devices {
        print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530"
    }

    let device = devices[0] 
    let library = device.newDefaultLibrary()
    let commandQueue = device.makeCommandQueue()

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough")

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!)

    let commandBuffer = commandQueue.makeCommandBuffer()
    let commandEncoder = commandBuffer.makeComputeCommandEncoder()

    commandEncoder.setComputePipelineState(cps)

    // Texture setup
    let width = 16
    let height = 16
    let byteCount = height*width*4
    let bytesPerRow = width*4
    let region = MTLRegionMake2D(0, 0, width, height)
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false)

    // inTexture
    var inData = [UInt8](repeating: 255, count: Int(byteCount))
    let inTexture = device.makeTexture(descriptor: textureDescriptor)
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow)

    // outTexture
    var outData = [UInt8](repeating: 128, count: Int(byteCount))
    let outTexture = device.makeTexture(descriptor: textureDescriptor)
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow)

    commandEncoder.setTexture(inTexture, at: 0)
    commandEncoder.setTexture(outTexture, at: 1)
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1))

    commandEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()

    // Get the data back from the GPU
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0)

    // Validation
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() {
        if outElement != inData[i] {
            print("Dest: \(outElement) != Src: \(inData[i]) at \(i))")
        }
    }

When running this code with let device = devices[0] (Radeon GPU), outTexture is never written to (my supposition) and as a result outData stays unchanged. On the other hand, when running this code with let device = devices[1] (Intel GPU), everything works as expected and outData is updated with the values in inData.

like image 556
Steve Begin Avatar asked Nov 24 '16 19:11

Steve Begin


People also ask

How to tell if Mac is using GPU?

Check if the discrete or integrated GPU is in use To see which graphics cards are in use, choose Apple () menu > About this Mac. The graphics cards currently in use appear next to Graphics.

What is core GPU in Mac?

The Apple M1 Pro 16-Core-GPU is an integrated graphics card by Apple offering all 16 cores in the M1 Pro Chip. The 2048 ALUs offer a theoretical performance of up to 5.3 Teraflops. The graphics card has no dedicated graphics memory but can use the fast LPDDR5-6400 unified memory with a 256 bit bus (up to 200 GBit/s).

What is CPU and GPU in Macbook?

A CPU (central processing unit) works together with a GPU (graphics processing unit) to increase the throughput of data and the number of concurrent calculations within an application.


1 Answers

I think that whenever the GPU writes to a MTLStorageModeManaged resource such as a texture and you then want to read that resource from the CPU (e.g. using getBytes()), you need to synchronize it using a blit encoder. Try putting the following above the commandBuffer.commit() line:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

You may get away without this on an integrated GPU because the GPU is using system memory for the resource and there's nothing to synchronize.

like image 150
Ken Thomases Avatar answered Sep 20 '22 23:09

Ken Thomases