Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does table-based sin approximation literature always use this formula when another formula seems to make more sense?

The literature on computing the elementary function sin with tables refers to the formula:

sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h)

where x = Cn + h, Cn is a constant for which sin(Cn) and cos(Cn) have been pre-computed and are available in a table, and, if following Gal's method, Cn has been chosen so that both sin(Cn) and cos(Cn) are closely approximated by floating-point numbers. The quantity h is close to 0.0. An example of reference to this formula is this article (page 7).

I don't understand why this makes sense: cos(h), however it is computed, will probably be wrong by at least 0.5 ULP for some values of h, and since it is close to 1.0, this seems to have a drastic effect on the accuracy of the result sin(x) when computed this way.

I do not understand why the formula below is not used instead:

sin(x) = sin(Cn) + (sin(Cn) * (cos(h) - 1.0) + cos(Cn) * sin(h))

Then the two quantities (cos(h) - 1.0) and sin(h) can be approximated with polynomials that are easy to make accurate as they produce results near zero. The values for sin(Cn) * (cos(h) - 1.0), cos(Cn) * sin(h) and for their sum is still small and its absolute accuracy is expressed in ULPs of the small quantity that the sum represents, so that adding this quantity to sin(Cn) is almost correctly rounded.

Am I missing something that makes the earlier, popular, simpler formula behave well too? Do the writers take it for granted that the readers will understand that the first formula is actually implemented as the second formula?

EDIT: Example

A single-precision table to compute single-precision sinf() and cosf() might contain the following point in single-precision:

         f             |        cos f          |       sin f      
-----------------------+-----------------------+---------------------
0.017967 0x1.2660bcp-6 |    0x1.ffead8p-1      |    0x1.265caep-6
                       |    (actual value:)    |    (actual value:)
                       | ~0x1.ffead8000715dp-1 | ~0x1.265cae000e6f9p-6

The following functions are specialized single-precision functions to use around 0.017967:

float sinf_trad(float x)
{
  float h = x - 0x1.2660bcp-6f;

  return 0x1.265caep-6f * cos_0(h) + 0x1.ffead8p-1f * sin_0(h);
}

float sinf_new(float x)
{
  float h = x - 0x1.2660bcp-6f;

  return 0x1.265caep-6f + (0x1.265caep-6f * cosm1_0(h) + 0x1.ffead8p-1f * sin_0(h));
}

Testing these functions between 0.01f and 0.025f seems to show that the new formula gives more precise results:

$ gcc -std=c99 test.c && ./a.out 
relative error, traditional: 2.169624e-07, new: 1.288049e-07
sum of squares of absolute error, traditional: 6.616202e-12, new: 2.522784e-12

I took several shortcuts so please look at the complete program.

like image 712
Pascal Cuoq Avatar asked May 16 '14 19:05

Pascal Cuoq


2 Answers

The implementation below partially answers the question, in that it is a single-precision implementation of sine, using the formula suggested in the question, that is accurate to 0.53 ULP over [0 … 1.57], and accurate to 0.5 ULP for 99.98% of its arguments over this range.

Specifically, I get the output:

error 285758762/536870912 ULP sin(2.11219326e-01) ref:2.09652290e-01 new:2.09652275e-01 
differences: 176880 / 1070134723

Meaning the error was never more than 285/536 of an ULP (about 0.53 ULP), and 176880 is the number of arguments where the error was above 0.5 ULP, out of a total of 1070134723 arguments.

It does not seem possible to achieve this sort of result with the plain sin(Cn) * cos(h) + cos(Cn) * sin(h) formula and only single-precision computations. The article cited in the question alludes to “the term c0*h being evaluated in an extended precision” in order to achieve overall accuracy.

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>

float c_cos_sin[][3] = {
  //  0x0.000000000p+0 /* 0.000000 */, 0x1.000000p+0, 0x0.000000p+0,
  //  0x0.00fb76590p+2 /* 0.015348 */, 0x1.fff090p-1, 0x1.f6e7a4p-7,
  //  0x0.01fd02f80p+2 /* 0.031068 */, 0x1.ffc0c0p-1, 0x1.fcee02p-6,
  //  0x0.0302f6280p+2 /* 0.047056 */, 0x1.ff6eeap-1, 0x1.8156aap-5,
  //  0x0.04029a400p+2 /* 0.062659 */, 0x1.fefec8p-1, 0x1.007b94p-4,
  //  0x0.0500a9d80p+2 /* 0.078165 */, 0x1.fe6fcap-1, 0x1.3fd706p-4,
  //  0x0.060215b80p+2 /* 0.093877 */, 0x1.fdbedcp-1, 0x1.7ff4e8p-4,
  //  0x0.070225580p+2 /* 0.109506 */, 0x1.fceee8p-1, 0x1.bfa3fcp-4,
  //  0x0.080460e00p+2 /* 0.125267 */, 0x1.fbfcf6p-1, 0x1.ffc0f6p-4,
  //  0x0.08fed4a00p+2 /* 0.140554 */, 0x1.faf372p-1, 0x1.1ee830p-3,
  //  0x0.0a0054100p+2 /* 0.156270 */, 0x1.f9c2d8p-1, 0x1.3ebd74p-3,
  //  0x0.0afc8eb00p+2 /* 0.171665 */, 0x1.f87978p-1, 0x1.5dd872p-3,
  0x0.0bff5db00p+2 /* 0.187461 */, 0x1.f707b0p-1, 0x1.7dad14p-3,
  0x0.0cfe70200p+2 /* 0.203030 */, 0x1.f57bcep-1, 0x1.9cf438p-3,
  0x0.0e024ef00p+2 /* 0.218891 */, 0x1.f3c87ap-1, 0x1.bcb7a0p-3,
  0x0.0efeab400p+2 /* 0.234294 */, 0x1.f202ecp-1, 0x1.db74a8p-3,
  0x0.10003da00p+2 /* 0.250015 */, 0x1.f014d0p-1, 0x1.fab664p-3,
  0x0.110242c00p+2 /* 0.265763 */, 0x1.ee0660p-1, 0x1.0cf2f4p-2,
  0x0.12055d400p+2 /* 0.281577 */, 0x1.ebd62ap-1, 0x1.1c8a4ap-2,
  0x0.13025de00p+2 /* 0.297019 */, 0x1.e994c2p-1, 0x1.2bb212p-2,
  0x0.13fc96600p+2 /* 0.312292 */, 0x1.e73c4ep-1, 0x1.3a9d34p-2,
  0x0.15014c400p+2 /* 0.328204 */, 0x1.e4abbcp-1, 0x1.4a1472p-2,
  0x0.15fe27a00p+2 /* 0.343637 */, 0x1.e210eep-1, 0x1.58fffep-2,
  0x0.1703b1200p+2 /* 0.359600 */, 0x1.df4050p-1, 0x1.685884p-2,
  0x0.180296e00p+2 /* 0.375158 */, 0x1.dc63e8p-1, 0x1.7736b2p-2,
  0x0.18fc8a600p+2 /* 0.390414 */, 0x1.d9790cp-1, 0x1.85b472p-2,
  0x0.19ffac000p+2 /* 0.406230 */, 0x1.d654fap-1, 0x1.94a1ecp-2,
  0x0.1aff07c00p+2 /* 0.421816 */, 0x1.d31f26p-1, 0x1.a33e6ap-2,
  0x0.1c0162800p+2 /* 0.437585 */, 0x1.cfc21ep-1, 0x1.b1ec42p-2,
  0x0.1cfe63200p+2 /* 0.453027 */, 0x1.cc5a50p-1, 0x1.c0317ep-2,
  0x0.1e0153a00p+2 /* 0.468831 */, 0x1.c8c0f4p-1, 0x1.ceb01ep-2,
  0x0.1efe6d800p+2 /* 0.484279 */, 0x1.c52024p-1, 0x1.dcbe7ep-2,
  0x0.1ffde5600p+2 /* 0.499872 */, 0x1.c15a92p-1, 0x1.ead0fcp-2,
  0x0.20fa9ac00p+2 /* 0.515296 */, 0x1.bd83eap-1, 0x1.f89e82p-2,
  0x0.220491000p+2 /* 0.531529 */, 0x1.b95c6cp-1, 0x1.038212p-1,
  0x0.22ff9c800p+2 /* 0.546851 */, 0x1.b55542p-1, 0x1.0a3d7ap-1,
  0x0.23faafc00p+2 /* 0.562176 */, 0x1.b133aep-1, 0x1.10e916p-1,
  0x0.250a2cc00p+2 /* 0.578746 */, 0x1.ac9ed2p-1, 0x1.180d0ep-1,
  0x0.25fee2800p+2 /* 0.593682 */, 0x1.a863d2p-1, 0x1.1e6bdep-1,
  0x0.2700b4000p+2 /* 0.609418 */, 0x1.a3d498p-1, 0x1.251056p-1,
  0x0.28025e000p+2 /* 0.625144 */, 0x1.9f2b7ap-1, 0x1.2ba13ap-1,
  0x0.28f975400p+2 /* 0.640226 */, 0x1.9a9aa0p-1, 0x1.31db54p-1,
  0x0.29fc6dc00p+2 /* 0.656032 */, 0x1.95b7ecp-1, 0x1.384ef4p-1,
  0x0.2afc27c00p+2 /* 0.671640 */, 0x1.90cb6cp-1, 0x1.3e9a4ap-1,
  0x0.2c0659c00p+2 /* 0.687888 */, 0x1.8b90c6p-1, 0x1.45127ap-1,
  0x0.2d017dc00p+2 /* 0.703216 */, 0x1.868952p-1, 0x1.4b18dep-1,
  0x0.2e04f3c00p+2 /* 0.719052 */, 0x1.813e8cp-1, 0x1.513d70p-1,
  0x0.2efcb8800p+2 /* 0.734175 */, 0x1.7c19bcp-1, 0x1.5706f0p-1,
  0x0.300642800p+2 /* 0.750382 */, 0x1.767dc8p-1, 0x1.5d2464p-1,
  0x0.30ff5cc00p+2 /* 0.765586 */, 0x1.7123d0p-1, 0x1.62cb9cp-1,
  0x0.3204f6c00p+2 /* 0.781553 */, 0x1.6b6d98p-1, 0x1.68a4d6p-1,
  0x0.3303af000p+2 /* 0.797100 */, 0x1.65c70cp-1, 0x1.6e4010p-1,
  0x0.34002f400p+2 /* 0.812511 */, 0x1.601740p-1, 0x1.73b86cp-1,
  0x0.35080ac00p+2 /* 0.828616 */, 0x1.5a0f1cp-1, 0x1.79579cp-1,
  0x0.35fda7800p+2 /* 0.843607 */, 0x1.545d16p-1, 0x1.7e7cc6p-1,
  0x0.37040f800p+2 /* 0.859623 */, 0x1.4e31bep-1, 0x1.83e3aep-1,
  0x0.3800eac00p+2 /* 0.875056 */, 0x1.482b1cp-1, 0x1.89002ap-1,
  0x0.390737c00p+2 /* 0.891066 */, 0x1.41d5b8p-1, 0x1.8e3432p-1,
  0x0.39fce7800p+2 /* 0.906061 */, 0x1.3bd3dep-1, 0x1.92fc2ap-1,
  0x0.3b0596c00p+2 /* 0.922216 */, 0x1.3546c4p-1, 0x1.9808d0p-1,
  0x0.3bf971c00p+2 /* 0.937100 */, 0x1.2f2b58p-1, 0x1.9c979ep-1,
  0x0.3d0275800p+2 /* 0.953275 */, 0x1.2874c8p-1, 0x1.a17120p-1,
  0x0.3e02c4400p+2 /* 0.968919 */, 0x1.21e3cap-1, 0x1.a60740p-1,
  0x0.3ef759000p+2 /* 0.983847 */, 0x1.1b8ec4p-1, 0x1.aa4f02p-1,
  0x0.3ff90a800p+2 /* 0.999575 */, 0x1.14d158p-1, 0x1.aeb732p-1,
  0x0.40f703800p+2 /* 1.015077 */, 0x1.0e1baep-1, 0x1.b2f468p-1,
  0x0.420693000p+2 /* 1.031651 */, 0x1.06dcb2p-1, 0x1.b75f2ap-1,
  0x0.4300fb800p+2 /* 1.046935 */, 0x1.001dcep-1, 0x1.bb5678p-1,
  0x0.440282800p+2 /* 1.062653 */, 0x1.f23bb2p-2, 0x1.bf4efcp-1,
  0x0.44fb18000p+2 /* 1.077826 */, 0x1.e49a58p-2, 0x1.c3095ep-1,
  0x0.45fe26000p+2 /* 1.093637 */, 0x1.d647a4p-2, 0x1.c6cfaap-1,
  0x0.4700de800p+2 /* 1.109428 */, 0x1.c7dba0p-2, 0x1.ca77aap-1,
  0x0.47fd2d800p+2 /* 1.124828 */, 0x1.b9af14p-2, 0x1.cdec48p-1,
  0x0.48fd3c000p+2 /* 1.140456 */, 0x1.ab3138p-2, 0x1.d1515ep-1,
  0x0.49f66d000p+2 /* 1.155666 */, 0x1.9cfd2cp-2, 0x1.d48338p-1,
  0x0.4b05ec000p+2 /* 1.172236 */, 0x1.8d67dcp-2, 0x1.d7deb0p-1,
  0x0.4bfebf800p+2 /* 1.187424 */, 0x1.7f0718p-2, 0x1.dad544p-1,
  0x0.4cfa07800p+2 /* 1.202761 */, 0x1.706b10p-2, 0x1.ddb6e0p-1,
  0x0.4e0324800p+2 /* 1.218942 */, 0x1.60e920p-2, 0x1.e0a1e6p-1,
  0x0.4efdb7800p+2 /* 1.234236 */, 0x1.522b24p-2, 0x1.e34658p-1,
  0x0.4ffb51000p+2 /* 1.249714 */, 0x1.432afap-2, 0x1.e5d57ep-1,
  0x0.50fb6d000p+2 /* 1.265346 */, 0x1.33f0b2p-2, 0x1.e84ce2p-1,
  0x0.5200bc000p+2 /* 1.281295 */, 0x1.24536ep-2, 0x1.eab19cp-1,
  0x0.52fbc0800p+2 /* 1.296616 */, 0x1.1541a6p-2, 0x1.ece01ep-1,
  0x0.54066d800p+2 /* 1.312892 */, 0x1.052cfep-2, 0x1.ef1104p-1,
  0x0.550424000p+2 /* 1.328378 */, 0x1.eb9ff8p-3, 0x1.f1077cp-1,
  0x0.55f93c800p+2 /* 1.343337 */, 0x1.cdd470p-3, 0x1.f2cfeap-1,
  0x0.56fa9d000p+2 /* 1.359046 */, 0x1.ae6e42p-3, 0x1.f49074p-1,
  0x0.57ff02000p+2 /* 1.374939 */, 0x1.8e8e2cp-3, 0x1.f63612p-1,
  0x0.58f813000p+2 /* 1.390141 */, 0x1.6ff8f0p-3, 0x1.f7aaf6p-1,
  0x0.5a0036800p+2 /* 1.406263 */, 0x1.4f722cp-3, 0x1.f915dcp-1,
  0x0.5b005b800p+2 /* 1.421897 */, 0x1.2fd214p-3, 0x1.fa55aep-1,
  0x0.5bfe01800p+2 /* 1.437378 */, 0x1.106e24p-3, 0x1.fb732ap-1,
  0x0.5cf89f000p+2 /* 1.452675 */, 0x1.e2b3c0p-4, 0x1.fc6ea8p-1,
  0x0.5e00f4800p+2 /* 1.468808 */, 0x1.a104e8p-4, 0x1.fd56eap-1,
  0x0.5f0527000p+2 /* 1.484689 */, 0x1.60420cp-4, 0x1.fe1a64p-1,
  0x0.5fffc8000p+2 /* 1.499987 */, 0x1.21cb4cp-4, 0x1.feb78ap-1,
  0x0.610318000p+2 /* 1.515814 */, 0x1.c23092p-5, 0x1.ff39eep-1,
  0x0.62032d800p+2 /* 1.531444 */, 0x1.424a9cp-5, 0x1.ff9a86p-1,
  0x0.62fd46000p+2 /* 1.546709 */, 0x1.8a9d8cp-6, 0x1.ffd9fap-1,
  0x0.63fbc1800p+2 /* 1.562241 */, 0x1.1856c2p-7, 0x1.fffb34p-1,
  0x0.65011f000p+2 /* 1.578193 */, -0x1.e4c59ap-8, 0x1.fffc6ap-1,
};

/*@ requires 0 <= x <= 1.6 ; */
float my_sinf(float x)
{
  const float offs = 0x0.0b8p+2f;
  if (x < offs)
    {
      float xx = x * x;
      /* Remez-optimized polynomial for relative accuracy on -0.164 .. 0.164,
         Not the full -0.18 .. 0.18 where it is used, which makes it worse
         on -0.164 .. 0.164. But even optimized without regard for 0.164 .. 0.18
         It is better than the table entry + correction there so we use it there
      */
      return x + x * xx * (-0.16666660487324f + xx * 8.3259065018069e-3f);
    }
  int i = (x - offs) * 64.0f;
  float *p = c_cos_sin[i];
  float F = p[0];
  float C = p[1];
  float S = p[2];
  float h = x - F;
#if 0
  float s = S * (cosl(h) - 1.0) + C * sinl(h); // ext-double computation
#endif
#if 1
  // Two Remez-optimized polynomials for absolute accuracy on -0.008 .. 0.008
  float s =  h * (C + h * (-0.4999976959797f * S + h * -0.166666183241f * C));
#endif
  return S + s; 
}

unsigned int m, c, t;
uint64_t max_ulp;

int main(){
  for (float f = 0.0f; f < 1.57f; f = nextafterf(f, 3.0f))
    {
      double rd = sin(f);
      float r = rd;
      float n = my_sinf(f);
      t++;
      if (r != n)
        {
          c++;
          uint64_t in, ir;
          double nd = n;
          memcpy(&in, &nd, 8);
          memcpy(&ir, &rd, 8);
          uint64_t ulp = in > ir ? in - ir : ir - in;
          if (ulp > max_ulp)
            printf("error %" PRIu64 "/536870912 ULP sin(%.8e) ref:%.8e new:%.8e \n", 
                   ulp, f, r, n);
          if (ulp > max_ulp)
              max_ulp = ulp;
        }
    }
  printf("differences: %u / %u\n", c, t);
}
like image 89
Pascal Cuoq Avatar answered Nov 12 '22 15:11

Pascal Cuoq


Well, this formula is a start. Then other transformations could be done, depending on the context. I agree that if the formula sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h) is applied in the target precision, then the rounding error of sin(Cn) * cos(h) is up to 1/2 ulp of the result, which is bad if the goal is to get an accurate result. However some terms are sometimes expressed in greater precision by using pseudo-expansions. For instance, a number can be represented by a pair (a,b) where b is much smaller than a and whose value is regarded as a+b. In such a case, cos(h) could be represented by a pair (1,h') and the computation would be equivalent to what you suggest.

Alternatively, the implementation can be detailed once the formulas to evaluate cos(h) and sin(h) are given. See Section 3.1 in Stehlé and Zimmermann's paper you cited: they define C*(h) = C(h) − 1, and use C* in the final formula, which is basically what you suggest.

Note: I'm note sure that using the above formula is the best choice. One could start with sin(x) = sin(Cn) + error_term, and compute the error term in some other way.

like image 26
vinc17 Avatar answered Nov 12 '22 15:11

vinc17