The problem is about getting some discontinuities in the execution time sequence for various input sizes. Specifically, I have been trying this code:
long double a[2000][2000];
int iter = 0;
int main(int argc, char const *argv[]){
istringstream is(argv[1]);
int N;
is >> N;
for(int i = 0; i <= N; ++i){
for (int J = 0; J <= N; ++J){
a[i][J] = (rand()%3+1)*(rand()%4+1);
}
}
clock_t clk= clock();
for(int k = 0; k < N; ++k){
for(int i = k+1; i < N; ++i){
a[i][k] = a[i][k]/a[k][k];
}
for(int i = k+1; i < N; ++i){
for(int j = k+1; j < N; ++j){
iter++;
a[i][j] = a[i][j] - a[i][k]*a[k][j];
}
}
}
clk = clock() - clk;
cout << "Time: " << ((double)clk)/CLOCKS_PER_SEC << "\n";
cout << iter << endl;
}
using g++ 5.4.1 for C++14 compilation.
I tried the code for various values of N. However something really weird happens around N = 500. Execution times are listed below. (These are the outputs of the code for various values of N.
N = 200 : 0.022136
N = 300 : 0.06792
N = 400 : 0.149622
N = 500 : 11.8341
N = 600 : 0.508186
N = 700 : 0.805481
N = 800 : 1.2062
N = 900 : 1.7092
N = 1000 : 2.35809
I tried for N = 500 a lot of times and also on another machine only to get similar results.
Around 500 we have the following:
N = 494 : 0.282626
N = 495 : 0.284564
N = 496 : 11.5308
N = 497 : 0.288031
N = 498 : 0.289903
N = 499 : 11.9615
N = 500 : 12.4032
N = 501 : 0.293737
N = 502 : 0.295729
N = 503 : 0.297859
N = 504 : 12.4154
N = 505 : 0.301002
N = 506 : 0.304718
N = 507 : 12.4385
Why is this happening?
On Monday, 21 January 1793, arguably one of the most significant public executions in history took place – King Louis XVI of France was beheaded by guillotine in the centre of Paris, ending with the drop of the blade over a thousand years of monarchy in France.
“Alabama's execution of Joe Nathan James Jr. took longer than any lethal injection in recorded US history, and may even be the longest execution ever using any method,” the group said. “Subjecting someone to 3 hours of pain and suffering is the definition of cruel & unusual punishment.”
Wood gasped and snorted for an hour and fifty-seven minutes after the drugs were injected, and the entire procedure took almost two hours; experts said the execution should have taken about ten minutes.
Your program could have floating point overflows and operations which result in NaN for certain cases (if a calculation results in infinity/NaN, then it spreads for your algorithm, so almost all the numbers become infinity/NaN. It depends on rand()
's output. If you change the seed with srand()
, you may not get a slowdown for the N=500
case).
And, because you use long double
, the compiled program uses FPU (you can reproduce this with float
or double
as well, if you compile for FPU instead of SSE). It seems, that FPU handles infinite numbers much slower than normal numbers.
You can easily reproduce this issue with this snippet:
int main() {
volatile long double z = 2;
for (int i=0; i<10000000; i++) {
z *= z;
}
return z;
}
If you use 2 for z
, this program runs slowly (z
will overflow). If you replace it with 1, it becomes fast (z
won't overflow).
You can read more about this here: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/
Here's the relevant part:
Performance implications on the x87 FPU
The performance of Intel’s x87 units on these NaNs and infinites is pretty bad. [...] Even today, on a SandyBridge processor, the x87 FPU causes a slowdown of about 370 to one on NaNs and infinities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With