I have a fairly large object that I want to load into shared memory so that multiple warps can access the object, something like
struct alignas(8) Foo{
int a;
float b;
vec2 c;
uvec2 d;
uint64_t e;
....
}
In CUDA, I'd handle loading this object in by reinterpreting the data as something else with the same alignment, and putting that into shared memory.
__global__ void bar(const Foo * global_foo_ptr){
__shared__ Foo shared_foo;
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int threads_per_block = blockDim.x;
auto reinterpreted_shared_foo_ptr = reinterpret_cast<std::uint64_t*>(&shared_foo);
auto reinterpreted_global_foo_ptr = reinterpret_cast<std::uint64_t*>(global_foo_ptr);
for(int i = tid; i < sizeof(Foo) / sizeof(std::uint64_t); i += threads_per_block ){
reinterpreted_shared_foo_ptr[i] = reinterpreted_global_foo_ptr[i];
}
}
which allows coalesced reads which threads can participate in parallel.
But I don't know how to do the same in Vulkan GLSL, I can't find a "reinterpret" cast function in Vulkan, though buffer_reference_2 I think has this capability on the global memory side only.
GLSL doesn't let you do low-level stuff like that. And I'm not sure you can do it through SPIR-V directly either.
SPIR-V allows something like this through OpBitcast. However, this requires that the pointers involved use physical addressing. And while storage buffers can be set to use physical addressing (if the implementation allows it), I don't see anything that would allow the Workgroup storage class (the SPIR-V equivalent to shared) to use physical addressing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With