Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it.

When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5.

But it prints 25 when compiling without -O2 (eg -O1) or uncommenting one of the 2 commented lines (prevent inlining).

#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline)) 
int f(int* todos, int input) {
    int* cur = todos-1; // fixes the ++ at the beginning of the loop
    int result = input;
    while(1) {
        cur++;
        int ch = *cur;
        // printf("(%i)\n", ch);
        switch(ch) {
            case 0:;
                goto end;
            case 1:;
                result = result*result;
            break;
        }
    }
    end:
    return result;
}
int main() {
    int todos[] = { 1, 0}; // 1:square, 0:end
    int input = 5;
    int result = f(todos, input);
    printf("=%i\n", result);
    printf("end\n");
    return 0;
}

Is GCC's option -O2 breaking this small program or do I have undefined behavior somewhere?

like image 998
Bernd Elkemann Avatar asked May 15 '14 15:05

Bernd Elkemann


2 Answers

int* cur = todos-1;

invokes undefined behavior. todos - 1 is an invalid pointer address.

Emphasis mine:

(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

like image 174
ouah Avatar answered Nov 17 '22 00:11

ouah


In supplement to @ouah's answer, this explains what the compiler is doing.

Generated assembler for reference:

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       be 05 00 00 00          mov    $0x5,%esi
  400459:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  400465:       00 
  400466:       48 83 c0 04             add    $0x4,%rax
  40046a:       8b 10                   mov    (%rax),%edx

However if I add a printf in main():

  400450:       48 83 ec 18             sub    $0x18,%rsp
  400454:       bf 84 06 40 00          mov    $0x400684,%edi
  400459:       31 c0                   xor    %eax,%eax
  40045b:       48 89 e6                mov    %rsp,%rsi
  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
  40046c:       00 
  40046d:       e8 ae ff ff ff          callq  400420 <printf@plt>
  400472:       48 8d 44 24 fc          lea    -0x4(%rsp),%rax
  400477:       be 05 00 00 00          mov    $0x5,%esi
  40047c:       48 83 c0 04             add    $0x4,%rax
  400480:       8b 10                   mov    (%rax),%edx

Specifically (in the printf version), these two instructions populate the todo array

  40045e:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  400465:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)

This is conspicuously missing from the non-printf version, which for some reason only assigns the second element:

  40045e:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
like image 28
Mark Nunberg Avatar answered Nov 16 '22 23:11

Mark Nunberg