I ran across this question on scicomp which involves computing a sum. There, you can see a c++ and a similar fortran implementation. Interestingly I saw the fortran version was faster by about 32%.
I thought, I was not sure about their result and tried to regenerate the situation. Here is the (very slightly) different codes I ran:
c++
#include <iostream>
#include <complex>
#include <cmath>
#include <iomanip>
int main ()
{
const double alpha = 1;
std::cout.precision(16);
std::complex<double> sum = 0;
const std::complex<double> a = std::complex<double>(1,1)/std::sqrt(2.);
for (unsigned int k=1; k<10000000; ++k)
{
sum += std::pow(a, k)*std::pow(k, -alpha);
if (k % 1000000 == 0)
std::cout << k << ' ' << sum << std::endl;
}
return 0;
}
fortran
implicit none
integer, parameter :: dp = kind(0.d0)
complex(dp), parameter :: i_ = (0, 1)
real(dp) :: alpha = 1
complex(dp) :: s = 0
integer :: k
do k = 1, 10000000
s = s + ((i_+1)/sqrt(2._dp))**k * k**(-alpha)
if (modulo(k, 1000000) == 0) print *, k, s
end do
end
I compile the above codes using gcc 4.6.3
and clang 3.0
on a Ubuntu 12.04 LTS
machine all with -O3
flag. Here's my timings:
time ./a.out
gfortran
real 0m1.538s
user 0m1.536s
sys 0m0.000s
g++
real 0m2.225s
user 0m2.228s
sys 0m0.000s
clang
real 0m1.250s
user 0m1.244s
sys 0m0.004s
Interestingly I can also see that the fortran
code is faster than the c++
by about the same 32% when gcc
is used. Using clang
, however, I can see that the c++
code actually runs faster by about 19%. Here are my questions:
clang
doing so well here? Is there a fortran front-end for llvm compiler? If there, will the code generated by that one be even faster?UPDATE:
Using -ffast-math -O3
options generates the following results:
gfortran
real 0m1.515s
user 0m1.512s
sys 0m0.000s
g++
real 0m1.478s
user 0m1.476s
sys 0m0.000s
clang
real 0m1.253s
user 0m1.252s
sys 0m0.000s
Npw g++
version is running as fast gfortran
and still clang
is faster than both. Adding -fcx-fortran-rules
to the above options does not significantly change the results
The time differences will be related to the time it takes to execute pow
, as the other code is relatively simple. You can check this by profiling. The question then is what the compiler does to compute the power function?
My timings: ~1.20 s for the Fortran version with gfortran -O3
, and 1.07 s for the C++ version compiled with g++ -O3 -ffast-math
. Note that -ffast-math
doesn't matter for gfortran
, as pow
will be called from a library, but it makes a huge difference for g++
.
In my case, for gfortran
, it's the function _gfortran_pow_c8_i4
that gets called (source code). Their implementation is the usual way to compute integer powers. With g++
on the other hand, it's a function template from the libstdc++ library, but I don't know how that's implemented. Apparently, it's slightly better written/optimizable. I don't know to what extent the function is compiled on the fly, considering it's a template. For what it's worth, the Fortran version compiled with ifort
and C++ version compiled with icc
(using -fast
optimization flag) both give the same timings, so I guess these use the same library functions.
If I just write a power function in Fortran with complex arithmetic (explicitely writing out real and imaginary parts), it's as fast as the C++ version compiled with g++
(but then -ffast-math
slows it down, so I stuck to only -O3
with gfortran
):
complex(8) function pow_c8_i4(a, k)
implicit none
integer, intent(in) :: k
complex(8), intent(in) :: a
real(8) :: Re_a, Im_a, Re_pow, Im_pow, tmp
integer :: i
Re_pow = 1.0_8
Im_pow = 0.0_8
Re_a = real(a)
Im_a = aimag(a)
i = k
do while (i.ne.0)
if (iand(i,1).eq.1) then
tmp = Re_pow
Re_pow = Re_pow*Re_a-Im_pow*Im_a
Im_pow = tmp *Im_a+Im_pow*Re_a
end if
i = ishft(i,-1)
tmp = Re_a
Re_a = Re_a**2-Im_a**2
Im_a = 2*tmp*Im_a
end do
pow_c8_i4 = cmplx(Re_pow,Im_pow,8)
end function
In my experience, using explicit real and imaginary parts in Fortran implementations is faster, allthough it's very convenient of course to use the complex types.
Final note: even though it's just an example, the way to call the power function each iteration is extremely inefficient. Instead, you should of course just multiply a
by itself each iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With