Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will gfortran or ifort compilers wisely use SIMD instructions when summing the product of two arrays?

I've got some code written with numpy, and I'm considering porting it to Fortran for better performance.

One operation I do several times is summing the element-wise product of two arrays:

sum(A*B)

It looks like fused multiply-add instructions would help with this. My current processor doesn't support these instructions, so I can't test things yet. However, I may upgrade to a new processor that does support FMA3 (an Intel Haswell processor).

Does anyone know if compiling the program with "-march=native" (or the ifort equivalent) will be enough to get the compiler (either gfortran or ifort) to wisely use SIMD instructions to optimize that code, or do you think I'll have to baby the compilers or code?

like image 630
lnmaurer Avatar asked Jan 10 '14 17:01

lnmaurer


1 Answers

If you use -march=native on a machine with SIMD, the compiler should generate SIMD instructions, although I've always used -xHost flag instead with ifort.

But I am not so sure how to make them do it "wisely". My feeling is that at -O3 level ifort and gfortran both tend to be overly aggressive on vectorization (that is, they use the SIMD functionality more often than they should). Very often I have to turn off vectorization to get the most efficient code. This, of course, may or may not be true for you.

It will usually be better to use vector libraries that are optimized for this task. You can use vdmul in MKL or gsl_vector_mul in GSL to do this.

Using -march=NEWARCH will result in a code tuned for the architecture NEWARCH but cannot run on an earlier architecture. You can use the -mtune=NEWARCH flag where NEWARCH is the architecture of your new processor. This will generate code tuned for the new architecture but still executable on the old one. Since you do not yet have the new machine, -mtune is probably what you need at the moment.

With ifort you can use vectorization report flags to show which part of the program has been vectorized. For example, ifort flag -vec-report=1 will give you such information during compilation. I am sure there will be an equivalent flag in gfortran.

like image 90
Xiaolei Zhu Avatar answered Sep 24 '22 17:09

Xiaolei Zhu