Why some arithmetic operations take more time than usual?

Tags:

I've detected an unusual computational time when performing arithmetic operations with floating numbers of small precision. The following simple code exhibit this behavior:

#include <time.h>
#include <stdlib.h>
#include <stdio.h>

const int MAX_ITER = 100000000;

int main(int argc, char *argv[]){
    double x = 1.0, y;
    int i;
    clock_t t1, t2;
    scanf("%lf", &y);
    t1 = clock();
    for (i = 0; i < MAX_ITER; i++)
        x *= y;
    t2 = clock();
    printf("x = %lf\n", x);
    printf("Time: %.5lfsegs\n", ((double) (t2 - t1)) / CLOCKS_PER_SEC);
    return 0;
}

Here are two different runs of the program:

With y = 0.5

x = 0.000000
Time: 1.32000segs
With y = 0.9

x = 0.000000
Time: 19.99000segs

I'm using a laptop with the following specs to test the code:

CPU: Intel® Core™2 Duo CPU T5800 @ 2.00GHz × 2
RAM: 4 GB
OS: Ubuntu 12.04 (64 bits)
Model: Dell Studio 1535

Could someone explain in detail why this behavior occurs? I'm aware that with y = 0.9 the x value goes to 0 more slowly than with y = 0.5, so I suspect the problem is directly related to this.

898

asked Sep 12 '12 17:09

jace

2 Answers

Denormal (or rather subnormal) numbers are often a performance hit. Slowly converging to 0, per your second example, will generate more subnormals. Read more here and here. For more serious reading, check out the oft-cited (and very dense) What Every Computer Scientist Should Know About Floating-Point Arithmetic.

From the second source:

Under IEEE-754, floating point numbers are represented in binary as:

Number = signbit \* mantissa \* 2exponent

There are potentially multiple ways of representing the same number, using decimal as an example, the number 0.1 could be represented as 1*10-1 or 0.1*100 or even 0.01 * 10. The standard dictates that the numbers are always stored with the first bit as a one. In decimal that corresponds to the 1*10-1 example.

Now suppose that the lowest exponent that can be represented is -100. So the smallest number that can be represented in normal form is 1*10-100. However, if we relax the constraint that the leading bit be a one, then we can actually represent smaller numbers in the same space. Taking a decimal example we could represent 0.1*10-100. This is called a subnormal number. The purpose of having subnormal numbers is to smooth the gap between the smallest normal number and zero.

It is very important to realise that subnormal numbers are represented with less precision than normal numbers. In fact, they are trading reduced precision for their smaller size. Hence calculations that use subnormal numbers are not going to have the same precision as calculations on normal numbers. So an application which does significant computation on subnormal numbers is probably worth investigating to see if rescaling (i.e. multiplying the numbers by some scaling factor) would yield fewer subnormals, and more accurate results.

I was thinking about explaining it myself, but the explanation above is extremely well written and concise.

113

answered Nov 15 '22 18:11

David Titarenco

You get a measurable difference not because 0.9^n converges to 0 more slowly than 0.5^n mathematically, but because in IEEE-754 floating-point implementations, it doesn't converge to 0 at all.

The smallest positive double in IEEE-754 representation is 2^-1074, the smallest positive normal is 2^-1021, so with y = 0.5, the loop encounters 53 subnormal numbers. Once the smallest positive subnormal is reached, the next product would be 2^-1075, but due to the round-ties-to-last-bit-zero default rounding mode that is rounded to 0. (IEEE-754 representation of floating point numbers and the default round-ties-to-last-bit-zero rounding mode are pretty much ubiquitous on standard consumer hardware, even if the standard is not fully implemented.) From then on, you have a multiplication 0*y which is an ordinary floating point multiplication (that one's fast even if y is a subnormal number).

With 0.5 < y < 1, once you've reached the lower end of the (positive) subnormal range, the result of x*y rounds to the value of x again (for y = 0.9, the fixed point of the iteration is 5*2^-1074). Since that is reached after a few thousand iterations (0.9^7 < 0.5), you're basically multiplying a subnormal number with a nonzero number for the entire loop. On many processors, such a multiplication can't be handled directly and has to be handled in microcode, which is a lot slower.

If speed is more important than IEEE-754 semantics (or if those are undesirable for other reasons), many compilers offer options to disable that behaviour and flush subnormal numbers to 0 if the hardware supports that. I couldn't find an option for explicitly that in my gcc's man page, but -ffast-math did the trick here.

answered Nov 15 '22 18:11

Daniel Fischer

Related questions
                            
                                What are the common libraries for C? [closed]
                            
                                How to use select() to read input from keyboard in C
                            
                                How can I programmatically get the amount of memory currently available from C/C++ code?
                            
                                How can I do automatic memory management in C?
                            
                                Reading a .txt text file in C containing float, separated by space
                            
                                __uint128_t on mingw gcc
                            
                                Regarding bit masking in C. Why (~(~0 << N)) is preferred than ((1 << N) -1)?
                            
                                struct members memory layout
                            
                                What's the range of file descriptors on 64-bit Linux?
                            
                                Is there anyway a valgrind message "Conditional jump or move depends on uninitialized value" can be a so called 'false positive'
                            
                                Difference between declaring shellcode as a char[] array and char*?
                            
                                Thread local variables and fs segment
                            
                                Is there any movement towards specifying interaction of C++ exceptions and pthread cancellation?
                            
                                Run-time mocking in C?
                            
                                What's the advantage of using epoll_create1() instead of epoll_create()
                            
                                Most probable bits in random integer
                            
                                Are C doubles different to .NET doubles?
                            
                                How can I get the address of a struct in C?
                            
                                C cache optimization for direct mapped cache
                            
                                What is the use of "#pragma section <XYZ>" in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why some arithmetic operations take more time than usual?

Tags:

performance

c

time

jace

People also ask

2 Answers

David Titarenco

Daniel Fischer

Recent Activity

Donate For Us