Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vectorized sum in Fortran

I am compiling my Fortran code using gfortran and -mavx and have verified that some instructions are vectorized via objdump, but I'm not getting the speed improvements that I was expecting, so I want to make sure the following argument is being vectorized (this single instruction is ~50% of the runtime).

I know that some instructions can be vectorized, while others cannot, so I want to make sure this can be:

sum(A(i1:i2,ir))

Again, this single line takes about 50% of the runtime since I am doing this over a very large matrix. I can give more information on why I am doing this, but suffice it to say that it is necessary, though I can restructure the memory if necessary (for example, I could do the sum as sum(A(ir,i1:i2)) if that could be vectorized instead.

Is this line being vectorized? How can I tell? How do I force vectorization if it is not being vectorized?

EDIT: Thanks to the comments, I now realize that I can check on the vectorization of this summation via -ftree-vectorizer-verbose and see that this is not vectorizing. I have restructured the code as follows:

tsum = 0.0d0
tn = i2 - i1 + 1
tvec(1:tn) = A(i1:i2, ir)
do ii = 1,tn
    tsum = tsum + tvec(ii)
enddo

and this ONLY vectorizes when I turn on -funsafe-math-optimizations, but I do see another 70% speed increase due to vectorization. The question still holds: Why does sum(A(i1:i2,ir)) not vectorize and how can I get a simple sum to vectorize?

like image 379
drjrm3 Avatar asked Aug 27 '15 18:08

drjrm3


1 Answers

It turns out that I am not able to make use of the vectorization unless I include -ffast-math or -funsafe-math-optimizations.

The two code snippets I played with are:

tsum = 0.0d0
tvec(1:n) = A(i1:i2, ir)
do ii = 1,n
    tsum = tsum + tvec(ii)
enddo

and

tsum = sum(A(i1:i2,ir))

and here are the times I get when running the first code snippet with different compilation options:

10.62 sec ... None
10.35 sec ... -mtune=native -mavx
 7.44 sec ... -mtune-native -mavx -ffast-math
 7.49 sec ... -mtune-native -mavx -funsafe-math-optimizations

Finally, with these same optimizations, I am able to vectorize tsum = sum(A(i1:i2,ir)) to get

 7.96 sec ... None
 8.41 sec ... -mtune=native -mavx
 5.06 sec ... -mtune=native -mavx -ffast-math
 4.97 sec ... -mtune=native -mavx -funsafe-math-optimizations

When we compare sum and -mtune=native -mavx with -mtune=native -mavx -funsafe-math-optimizations, it shows a ~70% speedup. (Note that these were only run once each - before we publish we will do true benchmarking on multiple runs).

I do take a small hit though. My values change slightly when I use the -f options. Without them, the errors for my variables (v1, v2) are :

v1 ... 5.60663e-15     9.71445e-17     1.05471e-15
v2 ... 5.11674e-14     1.79301e-14     2.58127e-15

but with the optimizations, the errors are :

v1 ... 7.11931e-15     5.39846e-15     3.33067e-16
v2 ... 1.97273e-13     6.98608e-14     2.17742e-14

which indicates that there truly is something different going on.

like image 180
drjrm3 Avatar answered Sep 30 '22 17:09

drjrm3