I've detected an unusual computational time when performing arithmetic operations with floating numbers of small precision. The following simple code exhibit this behavior:
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
const int MAX_ITER = 100000000;
int main(int argc, char *argv[]){
double x = 1.0, y;
int i;
clock_t t1, t2;
scanf("%lf", &y);
t1 = clock();
for (i = 0; i < MAX_ITER; i++)
x *= y;
t2 = clock();
printf("x = %lf\n", x);
printf("Time: %.5lfsegs\n", ((double) (t2 - t1)) / CLOCKS_PER_SEC);
return 0;
}
Here are two different runs of the program:
With y = 0.5
x = 0.000000
Time: 1.32000segs
With y = 0.9
x = 0.000000
Time: 19.99000segs
I'm using a laptop with the following specs to test the code:
Could someone explain in detail why this behavior occurs? I'm aware that with y = 0.9 the x value goes to 0 more slowly than with y = 0.5, so I suspect the problem is directly related to this.
Standard arithmetic operations (addition, subtraction, multiplication, integer division, remainder, comparison) and boolean operations (bitwise and, or, xor, shift, rotate) on words require O ( 1) time by definition. Formally, the word size w is NOT a constant for purposes of analyzing algorithms in this model.
Arithmetic operations is a branch of mathematics, that involves the study of numbers, operation of numbers that are useful in all the other branches of mathematics. It basically comprises operations such as Addition, Subtraction, Multiplication and Division. These basic mathematical operations (+, -, ×, and ÷) we use in our everyday life.
Arithmetic is a branch of mathematics, that involves the study of numbers, operation of numbers that are useful in all the other branches of mathematics. It basically comprises of operations such as Addition, Subtraction, Multiplication and Division.
But the set of integers are closed for the arithmetic operations such as addition, subtraction and multiplication. What are the rules for adding integers? What are the symbols of four basic operations in Mathematics?
Denormal (or rather subnormal) numbers are often a performance hit. Slowly converging to 0
, per your second example, will generate more subnormals. Read more here and here. For more serious reading, check out the oft-cited (and very dense) What Every Computer Scientist Should Know About Floating-Point Arithmetic.
From the second source:
Under IEEE-754, floating point numbers are represented in binary as:
Number = signbit \* mantissa \* 2exponent
There are potentially multiple ways of representing the same number, using decimal as an example, the number 0.1 could be represented as 1*10-1 or 0.1*100 or even 0.01 * 10. The standard dictates that the numbers are always stored with the first bit as a one. In decimal that corresponds to the 1*10-1 example.
Now suppose that the lowest exponent that can be represented is -100. So the smallest number that can be represented in normal form is 1*10-100. However, if we relax the constraint that the leading bit be a one, then we can actually represent smaller numbers in the same space. Taking a decimal example we could represent 0.1*10-100. This is called a subnormal number. The purpose of having subnormal numbers is to smooth the gap between the smallest normal number and zero.
It is very important to realise that subnormal numbers are represented with less precision than normal numbers. In fact, they are trading reduced precision for their smaller size. Hence calculations that use subnormal numbers are not going to have the same precision as calculations on normal numbers. So an application which does significant computation on subnormal numbers is probably worth investigating to see if rescaling (i.e. multiplying the numbers by some scaling factor) would yield fewer subnormals, and more accurate results.
I was thinking about explaining it myself, but the explanation above is extremely well written and concise.
You get a measurable difference not because 0.9^n
converges to 0 more slowly than 0.5^n
mathematically, but because in IEEE-754 floating-point implementations, it doesn't converge to 0 at all.
The smallest positive double
in IEEE-754 representation is 2-1074, the smallest positive normal is 2-1021, so with y = 0.5
, the loop encounters 53 subnormal numbers. Once the smallest positive subnormal is reached, the next product would be 2-1075, but due to the round-ties-to-last-bit-zero default rounding mode that is rounded to 0. (IEEE-754 representation of floating point numbers and the default round-ties-to-last-bit-zero rounding mode are pretty much ubiquitous on standard consumer hardware, even if the standard is not fully implemented.) From then on, you have a multiplication 0*y
which is an ordinary floating point multiplication (that one's fast even if y
is a subnormal number).
With 0.5 < y < 1
, once you've reached the lower end of the (positive) subnormal range, the result of x*y
rounds to the value of x
again (for y = 0.9
, the fixed point of the iteration is 5*2-1074). Since that is reached after a few thousand iterations (0.9^7 < 0.5
), you're basically multiplying a subnormal number with a nonzero number for the entire loop. On many processors, such a multiplication can't be handled directly and has to be handled in microcode, which is a lot slower.
If speed is more important than IEEE-754 semantics (or if those are undesirable for other reasons), many compilers offer options to disable that behaviour and flush subnormal numbers to 0 if the hardware supports that. I couldn't find an option for explicitly that in my gcc's man page, but -ffast-math
did the trick here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With