Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tan() computation is two times longer than sin()/cos() with g++ 4.8.2

Tags:

c++

c++11

g++4.8

I'm working with algorithms using a large amount of maths functions, and recently we ported the code under g++ 4.8.2 on an Ubuntu system from a Solaris platform.

Surprisingly, some of the algorithms were taking way much time than before. The reason behind is that the std::tan() function is two times longer than doing std::sin()/std::cos().

Replacing the tan by sin/cos has considerably reduced the computing time for the same results. I wonder why there is such a difference. Is it because of the implementation of the standard library ? Shouldn't the tan function be more effective ?

I wrote a program to check the time of the functions :

#include <cmath>
#include <iostream>
#include <chrono>

int main(int argc, char * argv[])
{
    using namespace std::chrono;

    auto start_tan = system_clock::now();

    for (int i = 0; i < 50000; ++i)
    {
        const double & a = static_cast<double>(i);
        const double & b = std::tan(a);
    }

    auto end_tan = system_clock::now();
    auto elapsed_time_tan = end_tan - start_tan;
    std::cout << "tan : ";
    std::cout << elapsed_time_tan.count() << std::endl;

    auto start_sincos = system_clock::now();

    for (int i =  0; i < 50000; ++i)
    {
        const double & a = static_cast<double>(i);
        const double & b = std::sin(a) / std::cos(a);
    }

    auto end_sincos = system_clock::now();
    auto elapsed_time_sincos = end_sincos - start_sincos;
    std::cout << "sincos : " << elapsed_time_sincos.count() << std::endl;

}

And indeed, in the output I have the following time without optimisation :

tan : 8319960
sincos : 4736988

And with optimisation (-O2) :

tan : 294
sincos : 120

If anyone has any idea about this behaviour.

EDIT

I modified the program according to @Basile Starynkevitch response :

#include <cmath>
#include <iostream>
#include <chrono>

int main(int argc, char * argv[])
{
    using namespace std::chrono;

   if (argc != 2) 
   {
      std::cout << "Need one and only argument : the number of iteration." << std::endl;
      return 1;
   }

   int nb_iter = std::atoi(argv[1]);
   std::cout << "Number of iteration programmed : " << nb_iter << std::endl;


   double tan_sum = 0.0;
   auto start_tan = system_clock::now();
    for (int i = 0; i < nb_iter; ++i)
    {
        const double & a = static_cast<double>(i);
        const double b = std::tan(a);
      tan_sum += b;
    }

    auto end_tan = system_clock::now();
    auto elapsed_time_tan = end_tan - start_tan;
    std::cout << "tan : " << elapsed_time_tan.count() << std::endl;
   std::cout << "tan sum : " << tan_sum << std::endl;

   double sincos_sum = 0.0;
    auto start_sincos = system_clock::now();
    for (int i =  0; i < nb_iter; ++i)
    {
        const double & a = static_cast<double>(i);
        const double b = std::sin(a) / std::cos(a);
      sincos_sum += b;
    }

    auto end_sincos = system_clock::now();
    auto elapsed_time_sincos = end_sincos - start_sincos;
    std::cout << "sincos : " << elapsed_time_sincos.count() << std::endl;
   std::cout << "sincos sum : " << sincos_sum << std::endl;

}

And now as result I get similar time for -O2only :

tan : 8345021
sincos : 7838740

But still the difference with -O2 -mtune=native, but faster indeed :

tan : 5426201
sincos : 3721938

I won't user -ffast-math because I need to keep IEEE compliance.

like image 923
dkg Avatar asked Dec 29 '25 05:12

dkg


2 Answers

You cannot trust non-optimized code for this.

Regarding optimization, the GCC compiler is probably throwing out the loop, since you don't do anything with the result. BTW b should not be a const double& reference but a const double.

If you want a meaningful benchmark, try storing b (or summing it). And make the number of iterations (50000) a runtime parameter (e.g. int nbiter = (argc>1)?atoi(argv[1]):1000;)

You might want to pass -O2 -ffast-math -mtune=native as optimizations flags to g++ (beware that -ffast-math is not standard compliant in the details of optimization)

With those flag a with my changes:

double sumtan=0.0, sumsincos=0.0;
int nbiter = argc>1?atoi(argv[1]):10000;
for (int i = 0; i < nbiter; ++i)
{
    const double & a = static_cast<double>(i);
    const double  b = std::tan(a);
    sumtan += b;
}
for (int i =  0; i < nbiter; ++i)
{
    const double & a = static_cast<double>(i);
    const double  b = std::sin(a) / std::cos(a);
    sumsincos += b;
}
std::cout << "tan : "  << elapsed_time_tan.count() 
          << " sumtan=" << sumtan << std::endl;

std::cout << "sincos : " << elapsed_time_sincos.count() 
          << " sumsincos=" << sumsincos << std::endl;

compiled with GCC 4.9.2 using

 g++ -std=c++11 -O2 -Wall -ffast-math -mtune=native b.cc -o b.bin

I'm getting quite similar timings:

  % ./b.bin 1000000
  tan :    77158579 sumtan=    -3.42432e+06
  sincos : 70219657 sumsincos= -3.42432e+06

this is on a 4 years old desktop (Intel(R) Xeon(R) CPU X3430 @ 2.40GHz)

If compiling with clang++ 3.5.0

tan :     78098229 sumtan=    -3.42432e+06
sincos : 106817614 sumsincos= -3.42432e+06

PS. Timing (and relative performance) is different with -O3. And some processors have machine instructions for sin, cos and tan but they might not be used (because the compiler or libm know that they are slower than a routine). GCC has builtins for these.

like image 200
Basile Starynkevitch Avatar answered Dec 31 '25 23:12

Basile Starynkevitch


Read the Intel developers manual. the trig functions are not as accurate aa the other math functions on the x86, so sin / cos will not give the same result as tan, which is something you should bear in mind if IEEE compliance is your reason for asking this.

As for the speed up, sin and cos can be obtained from the same instruction, so long as the compiler is not brain dead. Computing tan to the same accuracy is more work. The compiler can not therefore substitute sin/cos without breaking the standard.

Depending on whether these last decimal places matter to you or not, you may need to look at this What is the error of trigonometric instructions on x86?

like image 33
camelccc Avatar answered Dec 31 '25 23:12

camelccc