I have written a simple Gaussian elimination algorithm using a std::vector
of double
s in C++ (gcc / Linux). Now I have seen that the runtime depends on the optimization level of the compiler (up to 5-times faster with -O3
). I wrote a small test program and received similar results. The problem is not the allocation of the vector nor any resizing etc.
It's the simple fact that the statement:
v[i] = x + y / z;
(or something like that) is much slower without optimization. I think the problem is the index operator. Without compiler optimization, the std::vector
is slower than a raw double *v
, but when I turn on optimization, the performance is equal and, to my surprise, even the access to the raw double *v
is faster.
Is there an explanation for this behaviour? I'm really not a professional developer, but I thought the compiler should be able to transfer statements like the above one rather directly to hardware instructions. Why is there a need to turn on an optimization and, more importantly, what is the disadvantage of the optimization? (If there is none, I wonder why the optimization is not the standard.)
Here is my vector test code:
const long int count = 100000;
const double pi = 3.1416;
void C_array (long int size)
{
long int start = time(0);
double *x = (double*) malloc (size * sizeof(double));
for (long int n = 0; n < count; n++)
for (long int i = 0; i < size; i++)
x[i] = i;
//x[i] = pi * (i-n);
printf ("C array : %li s\n", time(0) - start);
free (x);
}
void CPP_vector (long int size)
{
long int start = time(0);
std::vector<double> x(size);
for (long int n = 0; n < count; n++)
for (long int i = 0; i < size; i++)
x[i] = i;
//x[i] = pi * (i-n);
printf ("C++ vector: %li s\n", time(0) - start);
}
int main ()
{
printf ("Size of vector: ");
long int size;
scanf ("%li", &size);
C_array (size);
CPP_vector (size);
return 0;
}
I received some weird results. A standard g++ compile produces a runtime 8 s (C array) or 18 s (std::vector
) for a vector size of 20 000. If I use the more complex line behind the //..
, the runtime is 8 / 15 s (yes, faster). If I turn on -O3
then, the runtime is 5 / 5 s for a 40,000 vector size.
Why do we want optimization/debug releases ?
Optimization may completely reorder the sequence of instructions, eliminate variables, inline functions calls and make the executable code so far away of the source code that you cannot debug it. So, one of the reason for not using optimization is to keep the possibility to debug the code. When your code is (when you believe your code is) fully debugged, you can turn on optimization to produce a release build.
Why is the debug code slow ?
What to do ?
You could try to show a simple compilable example exhibiting the behavior. We could then compile and look at the assembly to explain what is really going on. The size of the data you're processing is important if you hit a cache issue.
Links
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With