I was reading the presentation on Optimizing Parallel Reduction in CUDA by Mark Harris. Here is a slide I have problem in:
It says there is bank conflict problem in this method. But why? All threads are accessing two consecutive memory cell which are in different banks. Neither of them accesses a specific memory cell concurrently.
This presentation dates from the very early days of CUDA, and applies to first generation hardware.
That hardware had shared memory arranged in 8 32 bit banks. Because every eighth entry in the shared array resides in the same bank, there are bank conflicts at a number of levels of that reduction tree.
This problem was addressed in newer hardware, where the number of banks was expanded to 32, meaning that this sort of bank conflict cannot occur.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With