Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of sinpi() and cospi() using standard C math library

The function sinpi(x) computes sin(πx), and the function cospi(x) computes cos(πx), where the multiplication with π is implicit inside the functions. These functions were initially introduced into the C standard math library as an extension by Sun Microsystems in the late 1980s. IEEE Std 754™-2008 specifies the equivalent functions sinPi and cosPi in section 9.

There are numerous computations where sin(πx) and cos(πx) occur naturally. A very simple example is the Box-Muller transform (G. E. P. Box and Mervin E. Muller, "A Note on the Generation of Random Normal Deviates". The Annals of Mathematical Statistics, Vol. 29, No. 2, pp. 610 - 611), which, given two independent random variables U₁ and U₂ with uniform distribution, produces independent random variables Z₁ and Z₂ with standard normal distribution:

Z₁ = √(-2 ln U₁) cos (2 π U₂)
Z₂ = √(-2 ln U₁) sin (2 π U₂)

A further example is the computation of sine and cosine for degree arguments, as in this computation of great-circle distance using the Haversine formula:

/* This function computes the great-circle distance of two points on earth 
   using the Haversine formula, assuming spherical shape of the planet. A 
   well-known numerical issue with the formula is reduced accuracy in the 
   case of near antipodal points.

   lat1, lon1  latitude and longitude of first point, in degrees [-90,+90]
   lat2, lon2  latitude and longitude of second point, in degrees [-180,+180]
   radius      radius of the earth in user-defined units, e.g. 6378.2 km or 
               3963.2 miles

   returns:    distance of the two points, in the same units as radius

   Reference: http://en.wikipedia.org/wiki/Great-circle_distance
*/
double haversine (double lat1, double lon1, double lat2, double lon2, double radius)
{
    double dlat, dlon, c1, c2, d1, d2, a, c, t;

    c1 = cospi (lat1 / 180.0);
    c2 = cospi (lat2 / 180.0);
    dlat = lat2 - lat1;
    dlon = lon2 - lon1;
    d1 = sinpi (dlat / 360.0);
    d2 = sinpi (dlon / 360.0);
    t = d2 * d2 * c1 * c2;
    a = d1 * d1 + t;
    c = 2.0 * asin (fmin (1.0, sqrt (a)));
    return radius * c;
}

For C++, the Boost library provides sin_pi and cos_pi, and some vendors offer sinpi and cospi functionality as extensions in system libraries. For example, Apple added __sinpi, __cospi and the corresponding single-precision versions __sinpif, __cospif to iOS 7 and OS X 10.9 (presentation, slide 101). But for many other platforms, there is no implementation readily accessible to C programs.

Compared with a traditional approach that uses e.g. sin (M_PI * x) and cos (M_PI * x), the use of sinpi and cospi improves accuracy by reducing rounding error via the internal multiplication with π, and also offers performance advantages due to the much simpler argument reduction.

How can one use the standard C math library to implement sinpi() and cospi() functionality in a reasonably efficient and standard compliant fashion?

like image 1000
njuffa Avatar asked Mar 14 '17 17:03

njuffa


1 Answers

For simplicity, I will focus on sincospi(), which simultaneously provides both the sine and the cosine results. sinpi and cospi can then be constructed as wrapper functions that discard unneeded data. In many applications, the handling of floating-point flags (see fenv.h) is not required, nor do we need errno error reporting most of the time, so I will omit these.

The basic algorithmic structure is straightforward. As very large arguments are always even integers, and therefore thus multiples of 2π, their sine and cosine values are well-known. Other arguments are folded into range [-¼,+¼] while recording quadrant information. Polynomial minimax approximations are used to compute sine and cosine on the primary approximation interval. Finally, quadrant data is used to map the preliminary results to the final result by cyclical exchange of results and sign change.

The correct handling of special operands (in particular -0, infinities, and NaNs) requires the compiler to apply only optimizations that comply with IEEE-754 rules. It may not transform x*0.0 into 0.0 (this is not correct for -0, infinities, and NaNs) nor may it optimize 0.0-x into -x as negation is a bit-level operation according to section 5.5.1 of IEEE-754 (yielding different results for zeros and NaNs). Most compilers will offer a flag that enforces the use of "safe" transformations, e.g. -fp-model=precise for the Intel C/C++ compiler.

One additional caveat applies to the use of the nearbyint function during argument reduction. Like rint, this function is specified to round according to the current rounding mode. When fenv.h isn't used, the rounding mode defaults to round "to-nearest-or-even". When it is used, there is a risk that a directed rounding mode is in effect. This could be worked around by the use of round, which always provides the rounding mode "round to nearest, ties away from zero" independent of current rounding mode. However, this function will tend to be slower since it is not supported by an equivalent machine instruction on most processor architectures.

A note on performance: The C99 code below relies heavily on the use of fma(), which implements a fused multiply-add operation. On most modern hardware architectures, this is directly supported by a corresponding hardware instruction. Where this is not the case, the code may experience significant slow-down due to generally slow FMA emulation.

 #include <math.h>
 #include <stdint.h>

/* Writes result sine result sin(πa) to the location pointed to by sp
   Writes result cosine result cos(πa) to the location pointed to by cp

   In extensive testing, no errors > 0.97 ulp were found in either the sine
   or cosine results, suggesting the results returned are faithfully rounded.
*/
void my_sincospi (double a, double *sp, double *cp)
{
    double c, r, s, t, az;
    int64_t i;

    az = a * 0.0; // must be evaluated with IEEE-754 semantics
    /* for |a| >= 2**53, cospi(a) = 1.0, but cospi(Inf) = NaN */
    a = (fabs (a) < 9.0071992547409920e+15) ? a : az;  // 0x1.0p53
    /* reduce argument to primary approximation interval (-0.25, 0.25) */
    r = nearbyint (a + a); // must use IEEE-754 "to nearest" rounding
    i = (int64_t)r;
    t = fma (-0.5, r, a);
    /* compute core approximations */
    s = t * t;
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */
    r =            -1.0369917389758117e-4;
    r = fma (r, s,  1.9294935641298806e-3);
    r = fma (r, s, -2.5806887942825395e-2);
    r = fma (r, s,  2.3533063028328211e-1);
    r = fma (r, s, -1.3352627688538006e+0);
    r = fma (r, s,  4.0587121264167623e+0);
    r = fma (r, s, -4.9348022005446790e+0);
    c = fma (r, s,  1.0000000000000000e+0);
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */
    r =             4.6151442520157035e-4;
    r = fma (r, s, -7.3700183130883555e-3);
    r = fma (r, s,  8.2145868949323936e-2);
    r = fma (r, s, -5.9926452893214921e-1);
    r = fma (r, s,  2.5501640398732688e+0);
    r = fma (r, s, -5.1677127800499516e+0);
    s = s * t;
    r = r * s;
    s = fma (t, 3.1415926535897931e+0, r);
    /* map results according to quadrant */
    if (i & 2) {
        s = 0.0 - s; // must be evaluated with IEEE-754 semantics
        c = 0.0 - c; // must be evaluated with IEEE-754 semantics
    }
    if (i & 1) { 
        t = 0.0 - s; // must be evaluated with IEEE-754 semantics
        s = c;
        c = t;
    }
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
    if (a == floor (a)) s = az;
    *sp = s;
    *cp = c;
}

The single-precision version differs basically only in the core approximations. Using exhaustive testing allows the precise determination of errors bounds.

#include <math.h>
#include <stdint.h>

/* Writes result sine result sin(πa) to the location pointed to by sp
   Writes result cosine result cos(πa) to the location pointed to by cp

   In exhaustive testing, the maximum error in sine results was 0.96677 ulp,
   the maximum error in cosine results was 0.96563 ulp, meaning results are
   faithfully rounded.
*/
void my_sincospif (float a, float *sp, float *cp)
{
    float az, t, c, r, s;
    int32_t i;

    az = a * 0.0f; // must be evaluated with IEEE-754 semantics
    /* for |a| > 2**24, cospi(a) = 1.0f, but cospi(Inf) = NaN */
    a = (fabsf (a) < 0x1.0p24f) ? a : az;
    r = nearbyintf (a + a); // must use IEEE-754 "to nearest" rounding
    i = (int32_t)r;
    t = fmaf (-0.5f, r, a);
    /* compute core approximations */
    s = t * t;
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */
    r =              0x1.d9e000p-3f;
    r = fmaf (r, s, -0x1.55c400p+0f);
    r = fmaf (r, s,  0x1.03c1cep+2f);
    r = fmaf (r, s, -0x1.3bd3ccp+2f);
    c = fmaf (r, s,  0x1.000000p+0f);
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */
    r =             -0x1.310000p-1f;
    r = fmaf (r, s,  0x1.46737ep+1f);
    r = fmaf (r, s, -0x1.4abbfep+2f);
    r = (t * s) * r;
    s = fmaf (t, 0x1.921fb6p+1f, r);
    if (i & 2) {
        s = 0.0f - s; // must be evaluated with IEEE-754 semantics
        c = 0.0f - c; // must be evaluated with IEEE-754 semantics
    }
    if (i & 1) {
        t = 0.0f - s; // must be evaluated with IEEE-754 semantics
        s = c;
        c = t;
    }
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
    if (a == floorf (a)) s = az;
    *sp = s;
    *cp = c;
}
like image 171
njuffa Avatar answered Sep 21 '22 16:09

njuffa