I'm writing an algorithm in OpenCL in which I'd need every work unit to remember a fair portion of data, say something between a long[70]
and a long[200]
or so per kernel.
Recent AMD devices have 32 KiB __local
memory, which is (for the given amount of data per kernel) enough to store the info for 20-58 work units. However, from what I understand from the architecture (and especially from this drawing), each shader core also has a dedicated amount of private memory. I however fail to find its size.
Can anyone tell me how to find out how much private memory each kernel has?
I'm particularly curious about the HD7970, since I plan to buy some of these soon.
Edit: Problem solved, the answer is here in appendix D.
The answer was given by user talonmies in the comments, so I'll write it in a new answer here to close the question.
These values can be found in Appendix D of the AMD APP OpenCL Programming Guide http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf (a similar document exists for nVidia). Apparently a register is 128 bits (4x32) for AMD devices and there are 16384 registers for all modern high-end devices, so that's a remarkable 256KB per compute unit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With