Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bug fixed with four nops in an if(0), world no longer makes sense

I was writing a function to figure out if a given system of linear inequalities has a solution, when all of a sudden it started giving the wrong answers after a seemingly innocuous change.

I undid some changes, re-did them, and then proceeded to fiddle for the next two hours, until I had reduced it to absurdity.

The following, inserted anywhere into the function body, but nowhere else in the program, fixes it:

if(0) {
    __asm__("nop\n");
    __asm__("nop\n");
    __asm__("nop\n");
    __asm__("nop\n");
}

It's for a school assignment, so I probably shouldn't post the function on the web, but this is so ridiculous that I don't think any context is going to help you. And all the function does is a bunch of math and looping. It doesn't even touch memory that isn't allocated on the stack.

Please help me make sense of the world! I'm loathe to chalk it up to the GCC, since the first rule of debugging is not to blame the compiler. But heck, I'm about to. I'm running Mac OS 10.5 on a G5 tower, and the compiler in question identifies itself as 'powerpc-apple-darwin9-gcc-4.0.1' but I'm thinking it could be an impostor...

UPDATE: Curiouser and curiouser... I diffed the .s files with nops and without. Not only are there too many differences to check, but with no nops the .s file is 196,620 bytes, and with it's 156,719 bytes. (!)

UPDATE 2: Wow, should have posted the code! I came back to the code today, with fresh eyes, and immediately saw the error. See my sheepish self-answer below.

like image 487
Casey Rodarmor Avatar asked Apr 02 '09 04:04

Casey Rodarmor


2 Answers

Most times when you modify the code inconsequentially and it fixes your problem, it's a memory corruption problem of some sort. We may need to see the actual code to do proper analysis, but that would be my first guess, based on the available information.

like image 137
paxdiablo Avatar answered Sep 26 '22 19:09

paxdiablo


It's faulty pointer arithmetic, either directly (through a pointer) or indirectly (by going past the end of an array). Check all your arrays. Don't forget that if your array is

 int a[4];

then a[4] doesn't exist.

What you're doing is overwriting something on the stack accidentally. The stack contains both locals, parameters, and the return address from your function. You might be damaging the return address in a way that the extra noops cures.

For example, if you have some code that is adding something to the return address, inserting those extra 16 bytes of noops would cure the problem, because instead of returning past the next line of code, you return into the middle of some noops.

One way you might be adding something to the return address is by going past the end of a local array or a parameter, for example

  int a[4];
  a[4]++;
like image 26
Joel Spolsky Avatar answered Sep 23 '22 19:09

Joel Spolsky