Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Approximating cosine on [0,pi] using only single precision floating point

i'm currently working on an approximation of the cosine. Since the final target device is a self-developement working with 32 bit floating point ALU / LU and there is a specialized compiler for C, I am not able to use the c library math functions (cosf,...). I'm aiming to code various methods that differ in terms of accuracy and number of instructions / cycles.

I've already tried a lot of different approximation algorithms, starting from fdlibm, taylor expansion, pade approximation, remez algorithm using maple and so on....

But as soon as I implement them using only float precision, there is a significant loss of precision. And be sure: I know that with double precision, a much higher precision is no problem at all...

Right now, i have some approximations which are exact up to a few thousand ulp around pi/2 (the range where the largest errors occur), and i feel that i am limited by the single precision conversions.

To address the topic argument reduction: input is in radian. i assume that an argument reduction will cause even more precision loss due to divisions / multiplications.... since my overall input range is only 0..pi, i decided to reduce the argument to 0..pi/2.

Therefore my question is: Does anybody know a single precision approximation to cosine function with high accuracy (and in the best case high efficiency)? Are there any algorithms that optimize approximations for single precision? Do you know whether the built-in cosf function computes the values with single oder double precision internally? ~

float ua_cos_v2(float x)
{
    float output;
    float myPi = 3.1415927410125732421875f;
    if (x < 0) x = -x;
    int quad = (int32_t)(x*0.63661977236f);//quad = x/(pi/2) = x*2/pi
    if (x<1.58f && x> 1.57f) //exclude approximation around pi/2
    {
        output = -(x - 1.57079637050628662109375f) - 2.0e-12f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f) + 0.16666667163372039794921875f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f) + 2.0e-13f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)+ 0.000198412701138295233249664306640625f*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f)*(x - 1.57079637050628662109375f);
        output -= 4.37E-08f;
    }
    else {
        float param_x;
        int param_quad = -1;
        switch (quad)
        {
        case 0:
            param_x = x;
            break;
        case 1:
            param_x = myPi - x;
            param_quad = 1;
            break;
        case 2:
            param_x = x - myPi;
            break;
        case 3:
            param_x = 2 * myPi - x;
            break;
        }
        float c1 = 1.0f,
            c2 = -0.5f,
            c3 = 0.0416666679084300994873046875f,
            c4 = -0.001388888922519981861114501953125f,
            c5 = 0.00002480158218531869351863861083984375f,
            c6 = -2.75569362884198199026286602020263671875E-7f,
            c7 = 2.08583283978214240050874650478363037109375E-9f,
            c8 = -1.10807162057025010426514199934899806976318359375E-11f;
        float _x2 = param_x * param_x;
        output = c1 + _x2*(c2 + _x2*(c3 + _x2*(c4 + _x2*(c5 + _x2*(c6 + _x2*(c7 
        + _x2* c8))))));
        if (param_quad == 1 || param_quad == 0)
            output = -output;
    }
    return output;
}

~

if I have forgotten any information, please do not hesitate to ask!

Thanks in advance

like image 537
Dexter S Avatar asked Jan 25 '23 17:01

Dexter S


2 Answers

It is certainly possible to compute cosine on [0, π] with any desired error bound >= 0.5 ulp using just native precision operations. However, the closer the target is to a correctly rounded function, the more up-front design work and computational work at runtime is required.

Transcendental functions implementations typically consist of argument reduction, core approximation(s), final fixup to counteract the argument reduction. In cases where the argument reduction involves subtraction, catastrophic cancellation needs to be avoided by explicitly or implicitly using higher precision. Implicit techniques can be designed to rely just on native precision computation, for example by splitting a constant like π into an unevaluated sum such as 1.57079637e+0f - 4.37113883e-8f when using IEEE-754 binary32 (single precision).

Achieving high accuracy with native precision computation is a lot easier when the hardware provides a fused multiply-add (FMA) operation. OP did not specify whether their target platform provides this operation, so I will first show a very simple approach offering moderate accuracy (maximum error < 5 ulps) relying just on multiplies and adds. I am assuming hardware that adheres to the IEEE-754 standard, and assume that float is mapped to IEEE-754 binary32 format.

The following is based on an archived blog post by Colin Wallace titled "Approximating sin(x) to 5 ULP with Chebyshev polynomials". It proposes to approximate sine on [-π, π] by using a polynomial in x² of sin(x)/(x*(x²-π²)), then multiplying this by x*(x²-π²). A standard trick to compute a²-b² more accurately is to rewrite it as (a-b) * (a+b). Representing π as an unevaluated sum of two floating-point numbers pi_high and pi_low avoids catastrophic cancellation during subtraction, which turns the computation x²-π² into ((x - pi_hi) - pi_lo) * ((x + pi_hi) + pi_lo).

The polynomial core approximation should ideally use a minimax approximation, which minimizes the maximum error. I have done so here. Various standard tools like Maple or Mathematics can be used for this, or one create one's own code based on the Remez algorithm.

For a cosine computation on [0, PI] we can make use of the fact that cos (t) = sin (π/2 - t). Substituting x = (π/2 - t) into x * (x - π/2) * (x + π/2) yields (π/2 - t) * (3π/2 - t) * (-π/2 - t). The constants can be split into high and low parts (or head and tail, to use another common idiom) as before.

/* Approximate cosine on [0, PI] with maximum error of 5.081154 ulp */
float cosine (float x)
{
    const float half_pi_hi       =  1.57079637e+0f; //  0x1.921fb6p+0
    const float half_pi_lo       = -4.37113883e-8f; // -0x1.777a5cp-25
    const float three_half_pi_hi =  4.71238899e+0f; //  0x1.2d97c8p+2
    const float three_half_pi_lo = -1.19248806e-8f; // -0x1.99bc5cp-27
    float p, s, hpmx, thpmx, nhpmx;

    /* cos(x) = sin (pi/2 - x) = sin (hpmx) */
    hpmx = (half_pi_hi - x) + half_pi_lo;               // pi/2 - x
    thpmx = (three_half_pi_hi - x) + three_half_pi_lo;  // 3*pi/2 - x
    nhpmx = (-half_pi_hi - x) - half_pi_lo;             // -pi/2 - x

    /* P(hpmx*hpmx) ~= sin (hpmx) / (hpmx * (hpmx * hpmx - pi * pi)) */
    s = hpmx * hpmx;
    p =         1.32823530e-10f;//  0x1.241500p-33
    p = p * s - 2.33173445e-8f; // -0x1.9096c4p-26 
    p = p * s + 2.52237896e-6f; //  0x1.528c48p-19
    p = p * s - 1.73501656e-4f; // -0x1.6bdbfep-13
    p = p * s + 6.62087509e-3f; //  0x1.b1e7dap-8
    p = p * s - 1.01321183e-1f; // -0x1.9f02f6p-4
    return hpmx * nhpmx * thpmx * p;
}

Below I am showing a classical approach which first reduces the argument into [-π/4, π/4] while recording the quadrant. The quadrant then tells us whether we need to compute a polynomial approximation to the sine or the cosine on this primary approximation interval, and whether we need to flip the sign of the final result. This code assumes that the target platform supports the FMA operation specified by IEEE-754, and that it is mapped via the standard C function fmaf() for single precision.

The code is straightforward except for the float-to-int conversion with rounding mode to-nearest-or-even that is used to compute the quadrant, which is performed by the "magic number addition" method and combined with the multiplication of 2/π (equivalent to division by π/2). The maximum error is less than 1.5 ulps.

/* compute cosine on [0, PI] with maximum error of 1.429027 ulp */
float my_cosf (float a)
{
    const float half_pi_hi =  1.57079637e+0f; //  0x1.921fb6p+0
    const float half_pi_lo = -4.37113883e-8f; // -0x1.777a5cp-25
    float c, j, r, s, sa, t;
    int i;

    /* subtract closest multiple of pi/2 giving reduced argument and quadrant */
    j = fmaf (a, 6.36619747e-1f, 12582912.f) - 12582912.f; // 2/pi, 1.5 * 2**23
    a = fmaf (j, -half_pi_hi, a);
    a = fmaf (j, -half_pi_lo, a);

    /* phase shift of pi/2 (one quadrant) for cosine */
    i = (int)j;
    i = i + 1;

    sa = a * a;
    /* Approximate cosine on [-PI/4,+PI/4] with maximum error of 0.87444 ulp */
    c =               2.44677067e-5f;  //  0x1.9a8000p-16
    c = fmaf (c, sa, -1.38877297e-3f); // -0x1.6c0efap-10
    c = fmaf (c, sa,  4.16666567e-2f); //  0x1.555550p-5
    c = fmaf (c, sa, -5.00000000e-1f); // -0x1.000000p-1
    c = fmaf (c, sa,  1.00000000e+0f); //  1.00000000p+0
    /* Approximate sine on [-PI/4,+PI/4] with maximum error of 0.64196 ulp */
    s =               2.86567956e-6f;  //  0x1.80a000p-19
    s = fmaf (s, sa, -1.98559923e-4f); // -0x1.a0690cp-13
    s = fmaf (s, sa,  8.33338592e-3f); //  0x1.111182p-7
    s = fmaf (s, sa, -1.66666672e-1f); // -0x1.555556p-3
    t = a * sa;
    s = fmaf (s, t, a);

    /* select sine approximation or cosine approximation based on quadrant */
    r = (i & 1) ? c : s;
    /* adjust sign based on quadrant */
    r = (i & 2) ? (0.0f - r) : r;

    return r;
}

As it turns out, in this particular case the use of FMA provides only a tiny benefit in terms of accuracy. If I replace calls to fmaf(a,b,c) with ((a)*(b)+(c)), the maximum error increases minimally to 1.451367 ulps, that is, it stays below 1.5 ulps.

like image 143
njuffa Avatar answered Feb 01 '23 17:02

njuffa


I see @njuffa has a good approach yet want to pose another approach given:

  • Angle is likely originally in degrees, not radians and take advantage of that.
  • Does not depend on float being IEEE.
  • fma may be weak and so not use it.

Perform range reduction using integer math, then find answer via self adjusting Taylor series.

#include <assert.h>

static float my_sinf_helper(float xx, float term, unsigned n) {
  if (term + 1.0f == 1.0f) {
    return term;
  }
  return term - my_sinf_helper(xx, xx * term / ((n + 1) * (n + 2)), n + 2);
}

static float my_cosf_helper(float xx, float term, unsigned n) {
  if (term + 1.0f == 1.0f) {
    return term;
  }
  return term - xx * my_cosf_helper(xx, term / ((n + 1) * (n + 2)), n + 2);
}

// valid for [-pi/4 + pi/4]
static float my_sinf_primary(float x) {
  return x * my_sinf_helper(x * x, 1.0, 1);
}

// valid for [-pi/4 + pi/4]
static float my_cosf_primary(float x) {
  return my_cosf_helper(x * x, 1.0, 0);
}

#define MY_PIf 3.1415926535897932384626433832795f
#define D2Rf(d) ((d)*(MY_PIf/180))

float my_cosdf(float x) {
  if (x < 0) {x = -x;}
  unsigned long long ux = (unsigned long long) x;
  x -= (float) ux;
  unsigned ux_primary = ux % 360u;
  int uxq = ux_primary%90;
  if (uxq >= 45) uxq -= 90;
  x += uxq;
  switch (ux_primary/45) {
    case 7: //
    case 0: return my_cosf_primary(D2Rf(x));
    case 1: //
    case 2: return -my_sinf_primary(D2Rf(x));
    case 3: //
    case 4: return -my_cosf_primary(D2Rf(x));
    case 5: //
    case 6: return my_sinf_primary(D2Rf(x));
  }
  assert(0);
  return 0;
}

Test code

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#define DBL_FMT "%+24.17e"

typedef struct {
  double x, y0, y1, adiff;
  unsigned n;
} test;

test worst = {0};

int my_cosd_test(float x) {
  test t;
  t.x = x;
  t.y0 = cos(x*acos(-1)/180);
  t.y1 = my_cosdf(x);
  t.adiff = fabs(t.y1 - t.y0);
  if (t.adiff > worst.adiff) {
    t.n = worst.n + 1;
    printf("n:%3u x:" DBL_FMT " y0:" DBL_FMT " y1:" DBL_FMT " d:" DBL_FMT "\n", //
        t.n, t.x, t.y0, t.y1, t.adiff);
    fflush(stdout);
    worst = t;
    if (t.n > 100)
      exit(-1);
  }
  return t.adiff != 0.0;
}

float rand_float_finite(void) {
  union {
    float f;
    unsigned char uc[sizeof(float)];
  } u;
  do {
    for (size_t i = 0; i < sizeof u.uc / sizeof u.uc[0]; i++) {
      u.uc[i] = (unsigned char) rand();
    }
  } while (!isfinite(u.f) || fabs(u.f) > 5000);
  return u.f;
}

int my_cosd_tests(unsigned n) {
  my_cosd_test(0.0);
  for (unsigned i = 0; i < n; i++) {
    my_cosd_test(rand_float_finite());
  }
  return 0;
}

int main(void) {
  my_cosd_tests(1000000);
}

Worst cast error: +8.2e-08. Max recursion depth note: 6.

n: 14 x:+3.64442993164062500e+03 y0:+7.14107074054115110e-01 y1:+7.14107155799865723e-01 d:+8.17457506130381262e-08

I'll review more later. I do see more extensive testing reaching about 9e-08 worst case error and some TBD issue with x > about 1e10.

like image 20
chux - Reinstate Monica Avatar answered Feb 01 '23 15:02

chux - Reinstate Monica