The problem is about getting some discontinuities in the execution time sequence for various input sizes. Specifically, I have been trying this code: <pre class="prettyprint"><code>long double a[2000][2000]; int iter = 0; int main(int argc, char const *argv[]){ istringstream is(argv[1]); int N; is >> N; for(int i = 0; i <= N; ++i){ for (int J = 0; J <= N; ++J){ a[i][J] = (rand()%3+1)*(rand()%4+1); } } clock_t clk= clock(); for(int k = 0; k < N; ++k){ for(int i = k+1; i < N; ++i){ a[i][k] = a[i][k]/a[k][k]; } for(int i = k+1; i < N; ++i){ for(int j = k+1; j < N; ++j){ iter++; a[i][j] = a[i][j] - a[i][k]*a[k][j]; } } } clk = clock() - clk; cout << "Time: " << ((double)clk)/CLOCKS_PER_SEC << "\n"; cout << iter << endl; } </code></pre> using g++ 5.4.1 for C++14 compilation. I tried the code for various values of N. However something really weird happens around N = 500. Execution times are listed below. (These are the outputs of the code for various values of N. <pre class="prettyprint"><code>N = 200 : 0.022136 N = 300 : 0.06792 N = 400 : 0.149622 N = 500 : 11.8341 N = 600 : 0.508186 N = 700 : 0.805481 N = 800 : 1.2062 N = 900 : 1.7092 N = 1000 : 2.35809 </code></pre> I tried for N = 500 a lot of times and also on another machine only to get similar results. Around 500 we have the following: <pre class="prettyprint"><code>N = 494 : 0.282626 N = 495 : 0.284564 N = 496 : 11.5308 N = 497 : 0.288031 N = 498 : 0.289903 N = 499 : 11.9615 N = 500 : 12.4032 N = 501 : 0.293737 N = 502 : 0.295729 N = 503 : 0.297859 N = 504 : 12.4154 N = 505 : 0.301002 N = 506 : 0.304718 N = 507 : 12.4385 </code></pre> Why is this happening?

Your program could have floating point overflows and operations which result in NaN for certain cases (if a calculation results in infinity/NaN, then it spreads for your algorithm, so almost all the numbers become infinity/NaN. It depends on <code>rand()</code>'s output. If you change the seed with <code>srand()</code>, you may not get a slowdown for the <code>N=500</code> case). And, because you use <code>long double</code>, the compiled program uses FPU (you can reproduce this with <code>float</code> or <code>double</code> as well, if you compile for FPU instead of SSE). It seems, that FPU handles infinite numbers much slower than normal numbers. You can easily reproduce this issue with this snippet: <pre class="prettyprint"><code>int main() { volatile long double z = 2; for (int i=0; i<10000000; i++) { z *= z; } return z; } </code></pre> If you use 2 for <code>z</code>, this program runs slowly (<code>z</code> will overflow). If you replace it with 1, it becomes fast (<code>z</code> won't overflow). You can read more about this here: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ Here's the relevant part: <blockquote> Performance implications on the x87 FPU The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. [...] Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one on NaNs and infinities. </blockquote>

Weird Execution Times

Tags:

c++

compiler-optimization

execution-time

The problem is about getting some discontinuities in the execution time sequence for various input sizes. Specifically, I have been trying this code:

Click to copy

long double a[2000][2000];
int iter = 0;
int main(int argc, char const *argv[]){
    istringstream is(argv[1]);
    int N;
    is >> N;
    for(int i = 0; i <= N; ++i){
        for (int J = 0; J <= N; ++J){
            a[i][J]  = (rand()%3+1)*(rand()%4+1);
        }
    }
    clock_t clk= clock();
    for(int k = 0; k < N; ++k){
        for(int i = k+1; i < N; ++i){
            a[i][k] = a[i][k]/a[k][k];
        }
        for(int i = k+1; i < N; ++i){
            for(int j = k+1; j < N; ++j){
                iter++;
                a[i][j] = a[i][j] - a[i][k]*a[k][j];
            }
        }
    }
    clk = clock() - clk;
    cout << "Time: " << ((double)clk)/CLOCKS_PER_SEC << "\n";
    cout << iter << endl;
}

using g++ 5.4.1 for C++14 compilation.

I tried the code for various values of N. However something really weird happens around N = 500. Execution times are listed below. (These are the outputs of the code for various values of N.

Click to copy

N = 200 : 0.022136
N = 300 : 0.06792
N = 400 : 0.149622
N = 500 : 11.8341
N = 600 : 0.508186
N = 700 : 0.805481
N = 800 : 1.2062
N = 900 : 1.7092
N = 1000 : 2.35809

I tried for N = 500 a lot of times and also on another machine only to get similar results.

Around 500 we have the following:

Click to copy

N = 494 : 0.282626
N = 495 : 0.284564
N = 496 : 11.5308
N = 497 : 0.288031
N = 498 : 0.289903
N = 499 : 11.9615
N = 500 : 12.4032
N = 501 : 0.293737
N = 502 : 0.295729
N = 503 : 0.297859
N = 504 : 12.4154
N = 505 : 0.301002
N = 506 : 0.304718
N = 507 : 12.4385

Why is this happening?

778

asked Nov 26 '17 20:11

firewithin

1 Answers

Your program could have floating point overflows and operations which result in NaN for certain cases (if a calculation results in infinity/NaN, then it spreads for your algorithm, so almost all the numbers become infinity/NaN. It depends on rand()'s output. If you change the seed with srand(), you may not get a slowdown for the N=500 case).

And, because you use long double, the compiled program uses FPU (you can reproduce this with float or double as well, if you compile for FPU instead of SSE). It seems, that FPU handles infinite numbers much slower than normal numbers.

You can easily reproduce this issue with this snippet:

Click to copy

int main() {
    volatile long double z = 2;

    for (int i=0; i<10000000; i++) {
        z *= z;
    }

    return z;
}

If you use 2 for z, this program runs slowly (z will overflow). If you replace it with 1, it becomes fast (z won't overflow).

You can read more about this here: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/

Here's the relevant part:

Performance implications on the x87 FPU

The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. [...] Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one on NaNs and infinities.

answered Sep 17 '22 21:09

geza

Related questions
                            
                                Using libc++ instead of libstdc++ in Qt Creator
                            
                                Intrusive algorithms equivalents in Rust
                            
                                Initialization list bug in gcc?
                            
                                Qualified-id call to base function via pointer
                            
                                Template overload resolution for operators inside an anonymous namespace
                            
                                How can I show lambda functions on backtraces?
                            
                                What is my best approach to determining compiler behaviour for empty infinite loops?
                            
                                Unexpected result for a type counter using templates with function local types in Clang
                            
                                g_main_loop_run blocks the Qthread and does not allow to stop video
                            
                                What does the standard mean by "a subsequent condition of that statement"?
                            
                                Differences in regex support between gcc 4.9.2 and gcc 5.3
                            
                                Is `using Base::operator T` allowed where `T` is a template type parameter?
                            
                                Why clang rejects variadic template friend function
                            
                                During startup program exited with code 0xc0000139 [duplicate]
                            
                                valarray with arithmetic operations return type
                            
                                Aggregate initialization of a struct, using its own data members
                            
                                Is this unsafe usage of a braced initializer list in a range-based for loop?
                            
                                Mysteriously invalid OpenGL context in Electron
                            
                                RSA sign in node.js and verify in C++
                            
                                Are user defined deduction guides involving template template parameter as a template for guidance standard compliant

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weird Execution Times

Tags:

c++

compiler-optimization

execution-time

firewithin

People also ask

1 Answers

geza

Recent Activity

Donate For Us