Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sine cosine modular extended precision arithmetic

I've seen in many impletation of sine/cosine a so called extended modular precision arithmetic. But what it is for? For instance in the cephes implemetation, after reduction to the range [0,pi/4], they are doing this modular precision arithmetic to improve the precision.

Hereunder the code:

z = ((x - y * DP1) - y * DP2) - y * DP3;

where DP1, DP2 and DP3 are some hardcoded coefficient. How to find those coefficient mathematically? I've understand the purpose of "modular extension arithmetic" for big num, but here what is its exact purpose?

like image 214
Kenzo Lespagnol Avatar asked Feb 25 '17 10:02

Kenzo Lespagnol


1 Answers

In the context of argument reduction for trigonometric functions, what you are looking at is Cody-Waite argument reduction, a technique introduced in the book: William J. Cody and William Waite, Software Manual for the Elementary Functions, Prentice-Hall, 1980. The goal is to achieve, for arguments up to a certain magnitude, an accurate reduced argument, despite subtractive cancellation in intermediate computation. For this purpose, the relevant constant is represented with more than native precision, by using a sum of multiple numbers of decreasing magnitude (here: DP1, DP2, DP3), such that all of the intermediate products except the least significant one can be computed without rounding error.

Consider as an example the computation of sin (113) in IEEE-754 binary32 (single precision). The typical argument reduction would conceptually compute i=rintf(x/(π/2)); reduced_x = x-i*(π/2). The binary32 number closest to π/2 is 0x1.921fb6p+0. We compute i=72, the product rounds to 0x1.c463acp+6, which is close to the argument x=0x1.c40000p+6. During subtraction, some leading bits cancel, and we wind up with reduced_x = -0x1.8eb000p-4. Note the trailing zeros introduced by renormalization. These zero bits carry no useful information. Applying an accurate approximation to the reduced argument, sin(x) = -0x1.8e0eeap-4, whereas the true result is -0x1.8e0e9d39...p-4. We wind up with large relative error and large ulp error.

We can remedy this by using a two-step Cody-Waite argument reduction. For example, we could use pio2_hi = 0x1.921f00p+0, and pio2_lo = 0x1.6a8886p-17. Note the eight trailing zero bits in single-precision representation ofpio2_hi, which allow us to multiply with any 8-bit integer i and still have the product i * pio2_hi representable exactly as a single-precision number. When we compute ((x - i * pio2_hi) - i * pio2_lo), we get reduced_x = -0x1.8eafb4p-4, and therefore sin(x) = -0x1.8e0e9ep-4, a quite accurate result.

The best way to split the constant into a sum will depend on the magnitude of i we need to handle, on the maximum number of bits subject to subtractive cancellation for a given argument range (based on how close integer multiples of π/2 can get to integers), and performance considerations. Typical real-life use cases involve two- to four-stage Cody-Waite reduction schemes. The availability of fused multiple-add (FMA) allows the use of constituent constants with fewer trailing zero bits. See this paper: Sylvie Boldo, Marc Daumas, and Ren-Cang Li, "Formally verified argument reduction with a fused multiply-add." IEEE Transactions on Computers, 58 :1139–1145, 2009. For a worked example using fmaf() you might want to look at the code in one of my previous answers.

like image 50
njuffa Avatar answered Sep 18 '22 18:09

njuffa