I ran across this question on scicomp which involves computing a sum. There, you can see a c++ and a similar fortran implementation. Interestingly I saw the fortran version was faster by about 32%.
I thought, I was not sure about their result and tried to regenerate the situation. Here is the (very slightly) different codes I ran:
c++
#include <iostream>
#include <complex>
#include <cmath>
#include <iomanip>
int main ()
{
    const double alpha = 1;
    std::cout.precision(16);
    std::complex<double> sum = 0;
    const std::complex<double> a = std::complex<double>(1,1)/std::sqrt(2.);
    for (unsigned int k=1; k<10000000; ++k)
    {
        sum += std::pow(a, k)*std::pow(k, -alpha);
        if (k % 1000000 == 0)
            std::cout << k << ' ' << sum << std::endl;
    }
    return 0;
}
fortran
implicit none
integer, parameter :: dp = kind(0.d0)
complex(dp), parameter :: i_ = (0, 1)
real(dp) :: alpha = 1
complex(dp) :: s = 0
integer :: k
do k = 1, 10000000
    s = s + ((i_+1)/sqrt(2._dp))**k * k**(-alpha)
    if (modulo(k, 1000000) == 0) print *, k, s
end do
end
I compile the above codes using gcc 4.6.3 and clang 3.0 on a Ubuntu 12.04 LTS machine all with -O3 flag. Here's my timings:
time ./a.out
gfortran
real    0m1.538s
user    0m1.536s
sys     0m0.000s
g++
real    0m2.225s
user    0m2.228s
sys     0m0.000s
clang
real    0m1.250s
user    0m1.244s
sys     0m0.004s
Interestingly I can also see that the fortran code is faster than the c++ by about the same 32% when gcc is used. Using clang, however, I can see that the c++ code actually runs faster by about 19%. Here are my questions:
clang doing so well here? Is there a fortran front-end for llvm compiler? If there, will the code generated by that one be even faster?UPDATE:
Using -ffast-math -O3 options generates the following results:
gfortran
real    0m1.515s
user    0m1.512s
sys     0m0.000s
g++
real    0m1.478s
user    0m1.476s
sys     0m0.000s
clang
real    0m1.253s
user    0m1.252s
sys     0m0.000s
Npw g++ version is running as fast gfortran and still clang is faster than both. Adding -fcx-fortran-rules to the above options does not significantly change the results
The time differences will be related to the time it takes to execute pow, as the other code is relatively simple. You can check this by profiling. The question then is what the compiler does to compute the power function?
My timings: ~1.20 s for the Fortran version with gfortran -O3, and 1.07 s for the C++ version compiled with g++ -O3 -ffast-math. Note that -ffast-math doesn't matter for gfortran, as pow will be called from a library, but it makes a huge difference for g++.
In my case, for gfortran, it's the function _gfortran_pow_c8_i4 that gets called (source code). Their implementation is the usual way to compute integer powers. With g++ on the other hand, it's a function template from the libstdc++ library, but I don't know how that's implemented. Apparently, it's slightly better written/optimizable. I don't know to what extent the function is compiled on the fly, considering it's a template. For what it's worth, the Fortran version compiled with ifort and C++ version compiled with icc (using -fast optimization flag) both give the same timings, so I guess these use the same library functions.
If I just write a power function in Fortran with complex arithmetic (explicitely writing out real and imaginary parts), it's as fast as the C++ version compiled with g++ (but then -ffast-math slows it down, so I stuck to only -O3 with gfortran):
complex(8) function pow_c8_i4(a, k)
implicit none
integer, intent(in) :: k
complex(8), intent(in) :: a
real(8) :: Re_a, Im_a, Re_pow, Im_pow, tmp
integer :: i
Re_pow = 1.0_8
Im_pow = 0.0_8
Re_a = real(a)
Im_a = aimag(a)
i = k
do while (i.ne.0)
  if (iand(i,1).eq.1) then
    tmp = Re_pow
    Re_pow = Re_pow*Re_a-Im_pow*Im_a
    Im_pow = tmp   *Im_a+Im_pow*Re_a
  end if
  i = ishft(i,-1)
  tmp = Re_a
  Re_a = Re_a**2-Im_a**2
  Im_a = 2*tmp*Im_a
end do
pow_c8_i4 = cmplx(Re_pow,Im_pow,8)
end function
In my experience, using explicit real and imaginary parts in Fortran implementations is faster, allthough it's very convenient of course to use the complex types.
Final note: even though it's just an example, the way to call the power function each iteration is extremely inefficient.  Instead, you should of course just multiply a by itself each iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With