I know this is a recurring question, but I haven't really found a useful answer yet. I'm basically looking for a fast approximation of the function <code>acos</code> in C++, I'd like to know if I can significantly beat the standard one. But some of you might have insights on my specific problem: I'm writing a scientific program which I need to be very fast. The complexity of the main algorithm boils down to computing the following expression (many times with different parameters): <pre class="prettyprint"><code>sin( acos(t_1) + acos(t_2) + ... + acos(t_n) ) </code></pre> where the <code>t_i</code> are known real (double) numbers, and <code>n</code> is very small (like smaller than 6). I need a precision of at least 1e-10. I'm currently using the standard <code>sin</code> and <code>acos</code> C++ functions. Do you think I can significantly gain speed somehow? For those of you who know some maths, do you think it would be smart to expand that sine in order to get an algebraic expression in terms of the <code>t_i</code> (only involving square roots)? Thank you your your answers.

The code below provides simple implementations of <code>sin()</code> and <code>acos()</code> that should satisfy your accuracy requirements and that you might want to try. Please note that the math library implementation on your platform is very likely highly tuned for the specific hardware capabilities of that platform and is probably also coded in assembly for maximum efficiency, so simple compiled C code not catering to specifics of the hardware is unlikely to provide higher performance, even when the accuracy requirements are somewhat relaxed from full double precision. As Viktor Latypov points out, it may also be worthwhile to search for algorithmic alternatives that do not require expensive calls to transcendental math functions. In the code below I have tried to stick to simple, portable constructs. If your compiler supports the <code>rint()</code> function [specified by C99 and C++11] you might want to use that instead of <code>my_rint()</code>. On some platforms, the call to <code>floor()</code> can be expensive since it requires dynamic changing of machine state. The functions <code>my_rint()</code>, <code>sin_core()</code>, <code>cos_core()</code>, and <code>asin_core()</code> would want to be inlined for best performance. Your compiler may do that automatically at high optimization levels (e.g. when compiling with <code>-O3</code>), or you could add an appropriate inlining attribute to these functions, e.g. inline or __inline depending on your toolchain. Not knowing anything about your platform I opted for simple polynomial approximations, which are evaluated using Estrin's scheme plus Horner's scheme. See Wikipedia for a description of these evaluation schemes: http://en.wikipedia.org/wiki/Estrin%27s_scheme , http://en.wikipedia.org/wiki/Horner_scheme The approximations themselves are of the minimax type and were custom generated for this answer with the Remez algorithm: http://en.wikipedia.org/wiki/Minimax_approximation_algorithm , http://en.wikipedia.org/wiki/Remez_algorithm The identities used in the argument reduction for <code>acos()</code> are noted in the comments, for <code>sin()</code> I used a Cody/Waite-style argument reduction, as described in the following book: W. J. Cody, W. Waite, Software Manual for the Elementary Functions. Prentice-Hall, 1980 The error bounds mentioned in the comments are approximate, and have not been rigorously tested or proven. <pre class="prettyprint"><code>/* not quite rint(), i.e. results not properly rounded to nearest-or-even */ double my_rint (double x) { double t = floor (fabs(x) + 0.5); return (x < 0.0) ? -t : t; } /* minimax approximation to cos on [-pi/4, pi/4] with rel. err. ~= 7.5e-13 */ double cos_core (double x) { double x8, x4, x2; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; /* evaluate polynomial using Estrin's scheme */ return (-2.7236370439787708e-7 * x2 + 2.4799852696610628e-5) * x8 + (-1.3888885054799695e-3 * x2 + 4.1666666636943683e-2) * x4 + (-4.9999999999963024e-1 * x2 + 1.0000000000000000e+0); } /* minimax approximation to sin on [-pi/4, pi/4] with rel. err. ~= 5.5e-12 */ double sin_core (double x) { double x4, x2, t; x2 = x * x; x4 = x2 * x2; /* evaluate polynomial using a mix of Estrin's and Horner's scheme */ return ((2.7181216275479732e-6 * x2 - 1.9839312269456257e-4) * x4 + (8.3333293048425631e-3 * x2 - 1.6666666640797048e-1)) * x2 * x + x; } /* minimax approximation to arcsin on [0, 0.5625] with rel. err. ~= 1.5e-11 */ double asin_core (double x) { double x8, x4, x2; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; /* evaluate polynomial using a mix of Estrin's and Horner's scheme */ return (((4.5334220547132049e-2 * x2 - 1.1226216762576600e-2) * x4 + (2.6334281471361822e-2 * x2 + 2.0596336163223834e-2)) * x8 + (3.0582043602875735e-2 * x2 + 4.4630538556294605e-2) * x4 + (7.5000364034134126e-2 * x2 + 1.6666666300567365e-1)) * x2 * x + x; } /* relative error < 7e-12 on [-50000, 50000] */ double my_sin (double x) { double q, t; int quadrant; /* Cody-Waite style argument reduction */ q = my_rint (x * 6.3661977236758138e-1); quadrant = (int)q; t = x - q * 1.5707963267923333e+00; t = t - q * 2.5633441515945189e-12; if (quadrant & 1) { t = cos_core(t); } else { t = sin_core(t); } return (quadrant & 2) ? -t : t; } /* relative error < 2e-11 on [-1, 1] */ double my_acos (double x) { double xa, t; xa = fabs (x); /* arcsin(x) = pi/2 - 2 * arcsin (sqrt ((1-x) / 2)) * arccos(x) = pi/2 - arcsin(x) * arccos(x) = 2 * arcsin (sqrt ((1-x) / 2)) */ if (xa > 0.5625) { t = 2.0 * asin_core (sqrt (0.5 * (1.0 - xa))); } else { t = 1.5707963267948966 - asin_core (xa); } /* arccos (-x) = pi - arccos(x) */ return (x < 0.0) ? (3.1415926535897932 - t) : t; } </code></pre>

C++ (and maths) : fast approximation of a trigonometric function

Tags:

c++

performance

trigonometry

I know this is a recurring question, but I haven't really found a useful answer yet. I'm basically looking for a fast approximation of the function acos in C++, I'd like to know if I can significantly beat the standard one.

But some of you might have insights on my specific problem: I'm writing a scientific program which I need to be very fast. The complexity of the main algorithm boils down to computing the following expression (many times with different parameters):

sin( acos(t_1) + acos(t_2) + ... + acos(t_n) )

where the t_i are known real (double) numbers, and n is very small (like smaller than 6). I need a precision of at least 1e-10. I'm currently using the standard sin and acos C++ functions.

Do you think I can significantly gain speed somehow? For those of you who know some maths, do you think it would be smart to expand that sine in order to get an algebraic expression in terms of the t_i (only involving square roots)?

Thank you your your answers.

393

asked Jun 29 '12 11:06

Seub

1 Answers

The code below provides simple implementations of sin() and acos() that should satisfy your accuracy requirements and that you might want to try. Please note that the math library implementation on your platform is very likely highly tuned for the specific hardware capabilities of that platform and is probably also coded in assembly for maximum efficiency, so simple compiled C code not catering to specifics of the hardware is unlikely to provide higher performance, even when the accuracy requirements are somewhat relaxed from full double precision. As Viktor Latypov points out, it may also be worthwhile to search for algorithmic alternatives that do not require expensive calls to transcendental math functions.

In the code below I have tried to stick to simple, portable constructs. If your compiler supports the rint() function [specified by C99 and C++11] you might want to use that instead of my_rint(). On some platforms, the call to floor() can be expensive since it requires dynamic changing of machine state. The functions my_rint(), sin_core(), cos_core(), and asin_core() would want to be inlined for best performance. Your compiler may do that automatically at high optimization levels (e.g. when compiling with -O3), or you could add an appropriate inlining attribute to these functions, e.g. inline or __inline depending on your toolchain.

Not knowing anything about your platform I opted for simple polynomial approximations, which are evaluated using Estrin's scheme plus Horner's scheme. See Wikipedia for a description of these evaluation schemes:

http://en.wikipedia.org/wiki/Estrin%27s_scheme , http://en.wikipedia.org/wiki/Horner_scheme

The approximations themselves are of the minimax type and were custom generated for this answer with the Remez algorithm:

http://en.wikipedia.org/wiki/Minimax_approximation_algorithm , http://en.wikipedia.org/wiki/Remez_algorithm

The identities used in the argument reduction for acos() are noted in the comments, for sin() I used a Cody/Waite-style argument reduction, as described in the following book:

W. J. Cody, W. Waite, Software Manual for the Elementary Functions. Prentice-Hall, 1980

The error bounds mentioned in the comments are approximate, and have not been rigorously tested or proven.

/* not quite rint(), i.e. results not properly rounded to nearest-or-even */
double my_rint (double x)
{
  double t = floor (fabs(x) + 0.5);
  return (x < 0.0) ? -t : t;
}

/* minimax approximation to cos on [-pi/4, pi/4] with rel. err. ~= 7.5e-13 */
double cos_core (double x)
{
  double x8, x4, x2;
  x2 = x * x;
  x4 = x2 * x2;
  x8 = x4 * x4;
  /* evaluate polynomial using Estrin's scheme */
  return (-2.7236370439787708e-7 * x2 + 2.4799852696610628e-5) * x8 +
         (-1.3888885054799695e-3 * x2 + 4.1666666636943683e-2) * x4 +
         (-4.9999999999963024e-1 * x2 + 1.0000000000000000e+0);
}

/* minimax approximation to sin on [-pi/4, pi/4] with rel. err. ~= 5.5e-12 */
double sin_core (double x)
{
  double x4, x2, t;
  x2 = x * x;
  x4 = x2 * x2;
  /* evaluate polynomial using a mix of Estrin's and Horner's scheme */
  return ((2.7181216275479732e-6 * x2 - 1.9839312269456257e-4) * x4 + 
          (8.3333293048425631e-3 * x2 - 1.6666666640797048e-1)) * x2 * x + x;
}

/* minimax approximation to arcsin on [0, 0.5625] with rel. err. ~= 1.5e-11 */
double asin_core (double x)
{
  double x8, x4, x2;
  x2 = x * x;
  x4 = x2 * x2;
  x8 = x4 * x4;
  /* evaluate polynomial using a mix of Estrin's and Horner's scheme */
  return (((4.5334220547132049e-2 * x2 - 1.1226216762576600e-2) * x4 +
           (2.6334281471361822e-2 * x2 + 2.0596336163223834e-2)) * x8 +
          (3.0582043602875735e-2 * x2 + 4.4630538556294605e-2) * x4 +
          (7.5000364034134126e-2 * x2 + 1.6666666300567365e-1)) * x2 * x + x; 
}

/* relative error < 7e-12 on [-50000, 50000] */
double my_sin (double x)
{
  double q, t;
  int quadrant;
  /* Cody-Waite style argument reduction */
  q = my_rint (x * 6.3661977236758138e-1);
  quadrant = (int)q;
  t = x - q * 1.5707963267923333e+00;
  t = t - q * 2.5633441515945189e-12;
  if (quadrant & 1) {
    t = cos_core(t);
  } else {
    t = sin_core(t);
  }
  return (quadrant & 2) ? -t : t;
}

/* relative error < 2e-11 on [-1, 1] */
double my_acos (double x)
{
  double xa, t;
  xa = fabs (x);
  /* arcsin(x) = pi/2 - 2 * arcsin (sqrt ((1-x) / 2)) 
   * arccos(x) = pi/2 - arcsin(x)
   * arccos(x) = 2 * arcsin (sqrt ((1-x) / 2))
   */
  if (xa > 0.5625) {
    t = 2.0 * asin_core (sqrt (0.5 * (1.0 - xa)));
  } else {
    t = 1.5707963267948966 - asin_core (xa);
  }
  /* arccos (-x) = pi - arccos(x) */
  return (x < 0.0) ? (3.1415926535897932 - t) : t;
}

answered Sep 21 '22 09:09

njuffa

Related questions
                            
                                Why does std::fstream set the EOF bit the way it does?
                            
                                Using SQLite with Qt
                            
                                LLVM JIT segfaults. What am I doing wrong?
                            
                                Using boost::numeric_cast<>
                            
                                What is an Anonymous Object?
                            
                                Why is std::type_info noncopyable? Am I allowed to store it somewhere?
                            
                                is it possible to place std::vector to shared memory?
                            
                                C++ Operator () parenthesis overloading
                            
                                Audio/MIDI C++ library for a real-time application
                            
                                How to build Google RE2 for Windows?
                            
                                boost::program_options config file option with multiple tokens
                            
                                Understanding virtual base classes and constructor calls
                            
                                LTO, Devirtualization, and Virtual Tables
                            
                                Using bind1st for a method that takes argument by reference
                            
                                Stumped with Unicode, Boost, C++, codecvts
                            
                                How can I ease the syntactic overhead of checking iterator values in C++?
                            
                                boost::asio io_service thread pool
                            
                                How to know if a ".exe" process was written with C++ or C#? [duplicate]
                            
                                Why does using this C++ function twice in one line cause a compile error?
                            
                                C++11 std::function and perfect forwarding

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With