Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fmad=false gives good performance

Tags:

cuda

nvidia

fma

From Nvidia release notes:

 The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of    
 floating-point multiplies and add/subtracts into floating-point multiply-add   
 operations (FMAD, FFMA, or DFMA) has been added: 
 --fmad=true and --fmad=false enables and disables the contraction respectively. 
 This switch is supported only when the --gpu-architecture option is set with     
 compute_20, sm_20, or higher. For other architecture classes, the contraction is     
  always enabled. 
 The --use_fast_math option implies --fmad=true, and enables the contraction.

I have two kernels - one is purely compute bound with lots of multiplications, whereas the other one is memory bound. I notice a consistent improvement in performance (around 5%) for my compute intensive kernel when I do -fmad=false...and around the same percent decline in performance when I turn it off for my memory bound kernel. So, FMA is working better for my memory bound kernel, but my compute bound kernel could squeeze a little performance by turning it off. What could be the reason? My device is M2090 and I am using CUDA 4.2.

Full compilation options: -arch,sm_20,-ftz=true,-prec-div=false,-prec-sqrt=false,-use_fast_math,-fmad=false (or I just remove fmad=false because that's the default anyway.

like image 328
Sayan Avatar asked Aug 17 '12 19:08

Sayan


1 Answers

Use of FMA may increase register pressure slightly, because three source operands must be available at the same time. So turning FMA generation on / off can lead to small differences in instruction scheduling and register allocation, which in turn can lead to small performance differences. For a compute-bound kernel with many multiply-add idioms, -fmad=true should make a significant performance difference, but as you say, your kernel is dominated by multiplies and thus will benefit little from use of FMA, and any gains may be offset by the register pressure / instruction scheduling aspects

like image 62
njuffa Avatar answered Sep 21 '22 01:09

njuffa