Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C gives different output based on optimization level (new example)

Based on this very good blog post, The Strict Aliasing Situation is Pretty Bad, I've placed the piece of code online for you to test it:

http://cpp.sh/9kht (output changes between -O0 and -O2)

#include <stdio.h>

long foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  return *x;
}

int main(void) {
  long l;
  printf("%ld\n", foo((int *)&l, &l));
}
  • Is there some sort of undefined behaviour here?

  • What is going on internally when we choose the -O2 level?

like image 486
Dave5545 Avatar asked Mar 15 '16 16:03

Dave5545


2 Answers

  1. Yes, this program has undefined behavior, because of the type-based aliasing rules, which can be summarized as "you cannot access a memory location declared with type A through a pointer of type B, except when B is a pointer to a character type (e.g. unsigned char *)." This is an approximation, but it is close enough for most purposes. Note that when A is a pointer to a character type, B may not be something else—yes, this means the common idiom of accessing a byte buffer "four at a time" through an uint32_t* is undefined behavior (the blog post also touches on this).

  2. The compiler assumes, when compiling foo, that x and y may not point to the same object. From this, it infers that the write through *y cannot change the value of *x, and it can just return the known value of *x, 0, without re-reading it from memory. It only does this when optimization is turned on because keeping track of what each pointer can and cannot point to is expensive (so the compilation is slower).

    Note that this is a "demons fly out of your nose" situation: the compiler is entitled to make the generated code for foo start with

    cmp  rx, ry
    beq  __crash_the_program
    ...
    

    (and a tool like UBSan might do just that)

like image 53
zwol Avatar answered Oct 06 '22 00:10

zwol


Said another way, the code (int *)&l says treat the pointer as a pointer to an int. It does not convert anything. So, the (int *) tells the compiler to allow you to pass a long* to a function expecting an int*. You are lying to it. Inside, foo expects x to be a pointer to an int, but it isn't. The memory layout is not what it should be. Results are, as you see, unpredictable.

On another note, I wouldn't ever use l (ell) as a variable name. It is too easily confused with 1 (one). For example, what is this?

int x = l;
like image 37
Blake McBride Avatar answered Oct 05 '22 22:10

Blake McBride