I'm using the CUDA Occupancy calculator to try to optimize my CUDA kernel. Currently I'm using 34 registers and zero shared memory...Thus the maximum occupancy is 63% for 310 Threads per block. When I could somehow change the registers (e.g. by passing kernel parameters via shared memory) to 20 or below I could get an occupancy of 100%. Is this a good way to do it or would you advise me to use another path of optimizing?
Further I'm also wondering if there's a newer version of the occupancy calculator for Compute Capability 2.1!?
Some points to consider:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With