Is there any way I could EXPLICITLY limit the number of GPU multiprocessors being used during runtime of my program? I would like to calculate how my algorithm scales up with growing number of multiprocessors.
If it helps: I am using CUDA 4.0 and device with compute capability 2.0.
Aaahhh... I know the problem. I played with it myself when writing a paper.
There is no explicit way to do it, however you can try "hacking" it, by having some of the blocks doing nothing.
From my own experiments, the 1.3 devices (I had GTX 285) schedules the blocks in sequence. So, if I launch 60 blocks onto 30 SMs, blocks 1-30 are scheduled onto SM 1-30 and then 31-60 again onto SM from 1 to 30. So, by disabling block 5 and 35, SM number 5 is practically not doing anything.
Note however, this is my private, experimental observation I made 2 years ago. It is in no way confirmed, supported, maintained, what-not by NVIDIA and may change (or already has changed) with the new GPUs and/or drivers.
I would suggest - try playing with some simple kernels which do a lot of stupid work and see how long does it take to compute on various "enabled"/"disabled" configurations. If you are lucky, you will catch a performance drop, indicating that 2 blocks are in fact executed by a single SM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With