How can one make use of Nvidia's tensor cores (in a compute shader?!) using Vulkan?
There is this article by Nvidia Programming Tensor Cores in CUDA 9, but that's obviously focusing on CUDA. I am not too familiar with CUDA but it looks like some measures must be taken to enable computations on the Tensor cores, like the algorithm must be set to some kind special type, and some math type must be set to the value CUDNN_TENSOR_OP_MATH
. I am wondering, if Tensor core acceleration could also be used from other APIs and I am especially interested in Vulkan.
More specifically, I'd like to dig into filters for denoising a bit more. To my understanding, filters mostly require exactly those mathematical operations which Tensor cores are able to accelerate, which are matrix-multiply-and-accumulate operations.
Nvidia has recently added a few new extensions, one of them being VK_NV_COOPERATIVE_MATRIX
which will allow the use of tensor cores inside Vulkan.
The capability for glslang to handle this new feature I believe was added yesterday which is why you haven't seen this until now (see here):
here are some examples of it being used:
https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.coopmat.comp
https://github.com/KhronosGroup/glslang/blob/4605e2ed2b2b1acbe157d365c3c528367b8b168f/Test/spv.1.3.coopmat.comp
#version 450 core
#extension GL_KHR_memory_scope_semantics : enable
#extension GL_NV_cooperative_matrix : enable
#extension GL_EXT_shader_explicit_arithmetic_types_float16 : enable
#pragma use_variable_pointers
layout (local_size_x = 64, local_size_y = 1, local_size_z = 1) in;
layout(set = 0, binding = 0) coherent buffer Block {
float y[1024*1024];
float x[];
} block;
void main()
{
fcoopmatNV<32, gl_ScopeSubgroup, 16, 8> m = fcoopmatNV<32, gl_ScopeSubgroup, 16, 8>(0.0);
m = m + m;
m = m - m;
m = -m;
m = 2.0*m;
m = m*2.0;
coopMatLoadNV(m, block.x, 16, 128, false);
coopMatStoreNV(m, block.x, 16, 128, false);
}
This appears to be quite analogous to how its done in CUDA, requiring explicit memory transfers to the memory where tensor cores can operate.
So to use them you need VK_NV_COOPERATIVE_MATRIX in vulkan and GL_NV_COOPERATIVE_MATRIX in glsl.
EDIT:
j00hi has mentioned that there is now an nvidia blog post on how to use these tensor cores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With