Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I have to turn on optimization in g++ for simple array access?

I have written a simple Gaussian elimination algorithm using a std::vector of doubles in C++ (gcc / Linux). Now I have seen that the runtime depends on the optimization level of the compiler (up to 5-times faster with -O3). I wrote a small test program and received similar results. The problem is not the allocation of the vector nor any resizing etc.

It's the simple fact that the statement:

v[i] = x + y / z;

(or something like that) is much slower without optimization. I think the problem is the index operator. Without compiler optimization, the std::vector is slower than a raw double *v, but when I turn on optimization, the performance is equal and, to my surprise, even the access to the raw double *v is faster.

Is there an explanation for this behaviour? I'm really not a professional developer, but I thought the compiler should be able to transfer statements like the above one rather directly to hardware instructions. Why is there a need to turn on an optimization and, more importantly, what is the disadvantage of the optimization? (If there is none, I wonder why the optimization is not the standard.)

Here is my vector test code:

const long int count = 100000;
const double pi = 3.1416;
void C_array (long int size)
{
  long int start = time(0);
  double *x = (double*) malloc (size * sizeof(double));
  for (long int n = 0; n < count; n++)
    for (long int i = 0; i < size; i++)
      x[i] = i;
      //x[i] = pi * (i-n);
  printf ("C array   : %li s\n", time(0) - start);
  free (x);
}

void CPP_vector (long int size)
{
  long int start = time(0);
  std::vector<double> x(size);
  for (long int n = 0; n < count; n++)
    for (long int i = 0; i < size; i++)
      x[i] = i;
      //x[i] = pi * (i-n);
  printf ("C++ vector: %li s\n", time(0) - start);
}

int main ()
{
  printf ("Size of vector: ");
  long int size;
  scanf ("%li", &size);

  C_array (size);
  CPP_vector (size);
  return 0;
}

I received some weird results. A standard g++ compile produces a runtime 8 s (C array) or 18 s (std::vector) for a vector size of 20 000. If I use the more complex line behind the //.., the runtime is 8 / 15 s (yes, faster). If I turn on -O3 then, the runtime is 5 / 5 s for a 40,000 vector size.

like image 432
Professional_Amateur Avatar asked Oct 28 '14 19:10

Professional_Amateur


1 Answers

Why do we want optimization/debug releases ?

Optimization may completely reorder the sequence of instructions, eliminate variables, inline functions calls and make the executable code so far away of the source code that you cannot debug it. So, one of the reason for not using optimization is to keep the possibility to debug the code. When your code is (when you believe your code is) fully debugged, you can turn on optimization to produce a release build.

Why is the debug code slow ?

  • One thing to keep in mind is that a debug version of the STL may contain additional checks for boundaries and validity of iterators. This can slow down the code by a factor of 10. This is known to be an issue with the Visual C++ STL, but in your case you are not using it. I don't know the state of the art of the gcc's STL.
  • Another possibility is that you are accessing the memory in a non linear sequence, producing lots of cache misses. When in debug mode, the compiler will respsect your code and produce this inefficient code. But when optimization is on, it may rewrite your accesses to be sequential and not produce any cache miss.

What to do ?

You could try to show a simple compilable example exhibiting the behavior. We could then compile and look at the assembly to explain what is really going on. The size of the data you're processing is important if you hit a cache issue.

Links

  • Visual C++ STL is slow in debug mode: http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/
  • What does the debug version of the STL do with Visual C++: http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Advanced-STL/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-3-of-n
  • Cache miss and their impact: http://channel9.msdn.com/Events/Build/2014/2-661 , specially from 29'27"
  • Cache again: https://www.youtube.com/watch?v=fHNmRkzxHWs at 36'34"
like image 128
fjardon Avatar answered Oct 04 '22 14:10

fjardon