Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does CUDA float program get faster in full speed FP64 mode?

My CUDA program uses only float, int, short and char types in its computation. None of the input or output arrays have members of type double. And none of the kernels create any double type inside them during computation.

This program has been compiled using CUDA SDK 5.5 in Release mode using NSight Eclipse. A typical compile line looks like this:

nvcc -O3 -gencode arch=compute_35,code=sm_35 -M -o "src/foo.d" "../src/foo.cu"

I am running this program on a GTX Titan on Linux. To my surprise, I noticed that this program runs 10% faster when I enable the full speed FP64 mode on Titan. This can be done by enabling CUDA Double Precision option in NVIDIA X Server Settings program.

While I am happy for this free speed bonus, I would like to learn the reasons why a CUDA float program could get faster in FP64 mode?

like image 522
Ashwin Nanjappa Avatar asked Dec 04 '13 08:12

Ashwin Nanjappa


1 Answers

I guess that when you enable the full speed FP64 mode on Titan, more compute units start participating in computation and these FP64 compute units can be used to computing FP32. But enabling large amount of FP64 blocks also slowing clock, so computing getting faster by only 10%.

How to get 10%? When Titan runs in 1/24 FP64 mode, it runs at 837MHz. When it runs in 1/3 FP64 mode, it runs at 725MHz. So (1+1/3)/(1+1/24) * 725/837 = 1.109.

References: http://www.anandtech.com/show/6760/nvidias-geforce-gtx-titan-part-1/4

I found confirmation my guess.

"What's more, the CUDA FP64 block has a very special execution rate: 1/1 FP32."

Reference http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2

This information for GK104, Titan have GK110. But it's one architecture. So I think that GK110 also have this opportunity.

like image 125
AlexanderKomarov Avatar answered Oct 31 '22 19:10

AlexanderKomarov