I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something... Is this supposed to happen according to the compiler spec, or is it a bug in the compiler? Here's the (simple) program I'm using: <pre class="prettyprint"><code>int main() { char *ptr = 0; *(ptr) = 0; } </code></pre> I'm compiling with <code>-O3</code>. That shouldn't generate invalid hardware instructions though, right? With <code>-O0</code>, I get a segfault when I run the code. That seems a lot more sane. Edit: It's generating a <code>ud2</code> instruction...

The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior. From the <code>clang</code> link above the rationale is explained as follows: <blockquote> Stores to null and calls through null pointers are turned into a __builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the result of other transformations like inlining and constant propagation) and we used to just delete the blocks that contained them because they were "obviously unreachable". While (from a pedantic language lawyer standpoint) this is strictly true, we quickly learned that people do occasionally dereference null pointers, and having the code execution just fall into the top of the next function makes it very difficult to understand the problem. From the performance angle, the most important aspect of exposing these is to squash downstream code. Because of this, clang turns these into a runtime trap: if one of these is actually dynamically reached, the program stops immediately and can be debugged. The drawback of doing this is that we slightly bloat code by having these operations and having the conditions that control their predicates. </blockquote> at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken. As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.

C code with undefined results, compiler generates invalid code (with -O3)

Tags:

c

gcc

undefined-behavior

I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...

Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?

Here's the (simple) program I'm using:

int main() {
    char *ptr = 0;
    *(ptr) = 0;
}

I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.

Edit: It's generating a ud2 instruction...

955

asked Oct 10 '14 23:10

fwenom

1 Answers

The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.

From the clang link above the rationale is explained as follows:

Stores to null and calls through null pointers are turned into a __builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the result of other transformations like inlining and constant propagation) and we used to just delete the blocks that contained them because they were "obviously unreachable".

While (from a pedantic language lawyer standpoint) this is strictly true, we quickly learned that people do occasionally dereference null pointers, and having the code execution just fall into the top of the next function makes it very difficult to understand the problem. From the performance angle, the most important aspect of exposing these is to squash downstream code. Because of this, clang turns these into a runtime trap: if one of these is actually dynamically reached, the program stops immediately and can be debugged. The drawback of doing this is that we slightly bloat code by having these operations and having the conditions that control their predicates.

at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.

As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.

answered Oct 18 '22 07:10

Shafik Yaghmour

Related questions
                            
                                Wrapping a C library for Lua: how do I create nested tables of functions?
                            
                                How can I remove a symbol from a shared object?
                            
                                Is it necessary to attempt to connect to all addresses returned by getaddrinfo()?
                            
                                Quick way to override -Werror flag?
                            
                                How can select() wait on regular file descriptors (non-sockets)?
                            
                                IDE for realtime collaboration that works with C/C++, C#, .Net [closed]
                            
                                Pointers to any function?
                            
                                strerror description strings
                            
                                Makefile, Compiling and Linking
                            
                                Fitting an unknown curve [closed]
                            
                                An array of strings stored in flash with PROGMEM in Arduino
                            
                                Get last function called in C/C++
                            
                                Shouldn't be this "=+" a syntax error?
                            
                                defining a function inside a function in c
                            
                                Controlling a servo with raspberry pi using the hardware PWM with wiringPi
                            
                                How to cut part of a string in c?
                            
                                C: Improving performance of function with heavy sin() usage
                            
                                Bitwise - How can I check if a binary number contains another?
                            
                                fscanf reads the last integer twice
                            
                                assignment discards 'volatile' qualifier from pointer target type

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With