On Macs with discrete graphics cards, Managed buffers should be used instead of Shared Buffers, however there are additional requirements to maintain synchronisation using [MTLBuffer:didModifyRange:].
However on Apple Silicon, if I force the use of Managed buffers by pretending [MTLDevice hasUnifiedMemory] returns NO
, and removing calls to didModifyRange:
, then the rendering is working just fine.
What's the best way to test Managed buffers on Apple Silicon where the GPU memory is unified so that I can be sure my code will work on older Macs?
The best practice for testing hardware compatibility is on the actual hardware in which you are testing compatibility. If you plan on supporting discrete GPUs, which are substantially different from Apple Silicon, it would be best to have access to one for testing.
You might approximate behavior, but remember that it is only a emulation, and there is no way to ensure that the actual hardware will work the same.
It would be akin to developing with the Simulator only, which is not at all a good practice.
UPDATE: There are numerous services that rent access to bare metal Macs. The MacInCloud service allows you to configure a machine with an external GPU (such as a AMD RX 580). It is only $0.99 for the first 24 hours.
There are many similar services out there, but that is the first service I was able to verify that discrete GPUs are an option.
In my experience, there is no best practice to test code when it comes to rendering api's, since there are many different factors here: GPU and CPU vendors (Apple, AMD, Intel), operating systems, drivers.
I agree with Jeshua:
The best practice for testing hardware compatibility is on the actual hardware in which you are testing compatibility.
There are many useful ways that can make development and testing easier:
You can detect vendors id:
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
NSString* appleGPU = [device.name containsString:@"Apple"];
NSString* intelGPU = [device.name containsString:@"Intel"];
NSString* amdGPU = [device.name containsString:@"AMD"];
NSString* nvidiaGPU = [device.name containsString:@"Nvidia"];
With following method you can find your gpu type:
bool externalGPU = [device isRemovable] == true;
bool integratedGPU = [device isLowPower] == true;
bool discreteGPU = [device isLowPower] == false;
Note:
[device isLowPower];
On a Mac with an Apple silicon M1 chip, the property is NO because the GPU runs with both high performance and low power.
Determine TBDR GPU architecture:
if (@available(macOS 10.15, *)) {
if ([device supportsFamily: MTLGPUFamilyApple4])
{
// The GPU does support Tile-Based Deferred Rendering technique
}
}
Understand the Managed Mode:
In a unified memory model, a resource with a MTLStorageModeManaged mode resides in system memory accessible to both the CPU and the GPU.
Behaves like MTLStorageModeShared
, has only one copy of content.
Note:
In a unified memory model, Metal may ignore synchronization calls completely because it only creates a single memory allocation for the resource.
You can also check some implementations by other developers:
PixarAnimationStudios/USD:
HgiMetalCapabilities::HgiMetalCapabilities(id<MTLDevice> device)
{
if (@available(macOS 10.14.5, ios 12.0, *)) {
_SetFlag(HgiDeviceCapabilitiesBitsConcurrentDispatch, true);
}
defaultStorageMode = MTLResourceStorageModeShared;
bool unifiedMemory = false;
if (@available(macOS 100.100, ios 12.0, *)) {
unifiedMemory = true;
} else if (@available(macOS 10.15, ios 13.0, *)) {
#if defined(ARCH_OS_IOS) || (defined(__MAC_10_15) && __MAC_OS_X_VERSION_MAX_ALLOWED >= __MAC_10_15)
unifiedMemory = [device hasUnifiedMemory];
#else
unifiedMemory = [device isLowPower];
#endif
}
_SetFlag(HgiDeviceCapabilitiesBitsUnifiedMemory, unifiedMemory);
#if defined(ARCH_OS_MACOS)
if (!unifiedMemory) {
defaultStorageMode = MTLResourceStorageModeManaged;
}
#endif
}
KhronosGroup/MoltenVK
// Metal Managed:
// - applies to both buffers and textures
// - default mode for textures on macOS
// - two copies of each buffer or texture when discrete memory available
// - convenience of shared mode, performance of private mode
// - on unified systems behaves like shared memory and has only one copy of content
// - when writing, use:
// - buffer didModifyRange:
// - texture replaceRegion:
// - when reading, use:
// - encoder synchronizeResource: followed by
// - cmdbuff waitUntilCompleted (or completion handler)
// - buffer/texture getBytes:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With