in my code I often compute things like the following piece (here C code for simplicity):
float cos_theta = /* some simple operations; no cosf call! */;
float sin_theta = sqrtf(1.0f - cos_theta * cos_theta); // Option 1
For this example ignore that the argument of the square root might be negative due to imprecisions. I fixed that with additional fdimf
call. However, I wondered if the following is more precise:
float sin_theta = sqrtf((1.0f + cos_theta) * (1.0f - cos_theta)); // Option 2
cos_theta
is between -1
and +1
so for each choice there will be situations where I subtract similar numbers and thus will loose precision, right? What is the most precise and why?
The difference of two squares identity is ( a + b ) ( a − b ) = a 2 − b 2 (a+b)(a-b)=a^2-b^2 (a+b)(a−b)=a2−b2.
The fputest checks the functionality of the floating point unit in CPUs. The test verifies the functionality by various arithmetic operations. In addition, the fputest stresses the CPU with the use of benchmarks. Both single and double precision numbers are used for the operations.
Addition and Subtraction might necessitate mantissa shift to make exponents match. This can cause the loss of some (or all) digits in one operand. Multiplication The product of two n-digit numbers is a 2n-digit number. Again, digits are lost, the product might not even be representable.
We use the quadratic formula to solve for "h". The quadratic formula itself can be a cause of w:loss of significance if the quantity "4ac" is very small. This can be remedied by not subtracting. x 1 = − b − b 2 − 4 a c 2 a or,in this case h 1 = − 2 x − 4 x 2 + 4 x 2 2 = − ( 1 + 2 ) x .
The most precise way with floats is likely to compute both sin and cos using a single x87 instruction, fsincos.
However, if you need to do the computation manually, it's best to group arguments with similar magnitudes. This means the second option is more precise, especially when cos_theta
is close to 0, where precision matters the most.
As the article What Every Computer Scientist Should Know About Floating-Point Arithmetic notes:
The expression x2 - y2 is another formula that exhibits catastrophic cancellation. It is more accurate to evaluate it as (x - y)(x + y).
Edit: it's more complicated than this. Although the above is generally true, (x - y)(x + y) is slightly less accurate when x and y are of very different magnitudes, as the footnote to the statement explains:
In this case, (x - y)(x + y) has three rounding errors, but x2 - y2 has only two since the rounding error committed when computing the smaller of x2 and y2 does not affect the final subtraction.
In other words, taking x - y, x + y, and the product (x - y)(x + y) each introduce rounding errors (3 steps of rounding error). x2, y2, and the subtraction x2 - y2 also each introduce rounding errors, but the rounding error obtained by squaring a relatively small number (the smaller of x and y) is so negligible that there are effectively only two steps of rounding error, making the difference of squares more precise.
So option 1 is actually going to be more precise. This is confirmed by dev.brutus's Java test.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With