Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GTX 680 , Keplers and maximum registers per thread

Tags:

cuda

I am asking the following questions as I am confused...

On various sites and papers I am finding statements saying that the Kepler architecture has increased the amount of registers per thread, but on my GTX680 this does not seem to be true as the RegsPerBlock is 65536 so for 1024 threads it will be 64 regs. What am I missing?.. Will there be more registers per thread in the future?

Regards Daniel

like image 936
Daniel Avatar asked Oct 25 '12 22:10

Daniel


2 Answers

There are two variants of the Kepler architecture, sm_30 and sm_35. The GTX 680 card is based on the GK104 GPU which implements the sm_30 architecture. This architecture has 64 registers per thread, of which 63 are available to user code, one being a dedicated zero register. Future GK110-based parts like K20 implement the sm_35 architecture which provides 256 registers per thread, of which 255 are available to user code (one again being a dedicated zero register)

like image 59
njuffa Avatar answered Oct 18 '22 02:10

njuffa


While what @njuffa wrote is true, it's also important to note that the maximum number of registers per thread does not necessarily equal (register file size / max number of threads per block). It might be the case that you can only utilize the maximum possible regs per thread with smaller thread blocks.

... and in fact, that's exactly how it actually is with CC 3.5 Kepler cards, and with Maxwell 5.x and Pascal 6.0 cards: The register file has 64 Ki registers; max threads per block is 1024; but max registers per thread is 255 (+ the zero register). Only grids with at most 256 threads per block get 255 regs per thread.

like image 1
einpoklum Avatar answered Oct 18 '22 02:10

einpoklum