Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizer bug or programming error?

First of all: I know that most optimization bugs are due to programming errors or relying on facts which may change depending on optimization settings (floating point values, multithreading issues, ...).

However I experienced a very hard to find bug and am somewhat unsure if there is any way to prevent these kind of errors from happening without turning the optimization off. Am I missing something? Could this really be an optimizer bug? Here's a simplified example:

struct Data {
  int    a;
  int    b;
  double c;
};

struct Test {
  void optimizeMe();

  Data m_data;
};

void Test::optimizeMe() {
  Data * pData; // Note that this pointer is not initialized!

  bool first = true;

  for (int i = 0; i < 3; ++i) {
    if (first) {
      first = false;

      pData = &m_data;

      pData->a = i * 10;
      pData->b = i * pData->a;
      pData->c = pData->b / 2;
    } else {
      pData->a = ++i;
    } // end if
  } // end for
};

int main(int argc, char *argv[]) {
  Test test;
  test.optimizeMe();
  return 0;
}

The real program of course has a lot more to do than this. But it all boils down to the fact that instead of accessing m_data directly, a (previously unitialized) pointer is being used. As soon as I add enough statements to the if (first)-part, the optimizer seems to change the code to something along these lines:

if (first) {
  first = false;

  // pData-assignment has been removed!

  m_data.a = i * 10;
  m_data.b = i * m_data.a;
  m_data.c = m_data.b / m_data.a;
} else {
  pData->a = ++i; // This will crash - pData is not set yet. 
} // end if

As you can see, it replaces the unnecessary pointer dereference with a direct write to the member struct. However it does not do this in the else-branch. It also removes the pData-assignment. Since the pointer is now still unitialized, the program will crash in the else-branch.

Of course there are various things which could be improved here, so you might blame it on the programmer:

  • Forget about the pointer and do what the optimizer does - use m_data directly.
  • Initialize pData to nullptr - that way the optimizer knows that the else-branch will fail if the pointer is never assigned. At least it seems to solve the problem in my test-environment.
  • Move the pointer assignment in front of the loop (effectively initializing pData with &m_data, which then could also be a reference instead of a pointer (for good measure). This makes sense because pData is needed in all cases so there is no reason to do this inside the loop.

The code is obviously smelly, to say the least, and I'm not trying to "blame" the optimizer for doing this. But I'm asking: What am I doing wrong? The program might be ugly, but it's valid code...

I should add that I'm using VS2012 with C++/CLI and v110_xp-Toolset. Optimization is set to /O2. Please also note that if you really want to reproduce the problem (that's not really the point of this question though) you need to play around with the complexity of the program. This is a very simplified example and the optimizer sometimes doesn't remove the pointer assignment. Hiding &m_data behind a function seems to "help".

EDIT:

Q: How do I know that the compiler is optimizing it to something like the example provided?

A: I'm not very good at reading assembler, I have looked at it however and have made 3 observations which make me believe that it's behaving this way:

  1. As soon as optimization kicks in (adding more assignments usually does the trick) the pointer assignment has no associated assembler statement. It also hasn't been moved up to the declaration, so it's really left uninitialized it seems (at least to me).
  2. In cases where the program crashes, the debugger skips the assignment statement. In cases where the program runs without problems, the debugger stops there.
  3. If I watch the content of pData and the content of m_data while debugging, it clearly shows that all assignments in the if-branch have an effect on m_data and m_data receives the correct values. The pointer itself it still pointing to the same uninitialized value it had from the beginning. Therefore I have to assume that it is in fact not using the pointer to make the assignments at all.

Q: Does it have to do anything with i (Loop unrolling)?

A: No, the actual program actually uses do { ... } while() to loop over a SQL SELECT-resultset so the iteration count is completely runtime-specific and cannot be predetermined by the compiler.

like image 349
Excelcius Avatar asked Apr 22 '13 18:04

Excelcius


2 Answers

It sure looks like an bug to me. It's fine for the optimizer to eliminate the unnecessary redirection, but it should not eliminate the assignment to pData.

Of course, you can work around the problem by assigning to pData before the loop (at least in this simple example). I gather that the problem in your actual code isn't as easily resolved.

like image 104
Ted Hopp Avatar answered Oct 21 '22 22:10

Ted Hopp


I also vote for an optimizer bug if it is really reproducible in this example. To overrule the optimizer you could try to declare pData as volatile.

like image 1
j.holetzeck Avatar answered Oct 21 '22 22:10

j.holetzeck