Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C code with undefined results, compiler generates invalid code (with -O3)

I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...

Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?

Here's the (simple) program I'm using:

int main() {
    char *ptr = 0;
    *(ptr) = 0;
}

I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.

Edit: It's generating a ud2 instruction...

like image 955
fwenom Avatar asked Oct 10 '14 23:10

fwenom


People also ask

What is undefined error in C?

An “Undefined Reference” error occurs when we have a reference to object name (class, function, variable, etc.) in our program and the linker cannot find its definition when it tries to search for it in all the linked object files and libraries.

What causes undefined Behaviour in C?

In C the use of any automatic variable before it has been initialized yields undefined behavior, as does integer division by zero, signed integer overflow, indexing an array outside of its defined bounds (see buffer overflow), or null pointer dereferencing.

How do you fix undefined reference error in C?

The error: undefined reference to function show() has appeared on the terminal shell as predicted. To solve this error, simply open the file and make the name of a function the same in its function definition and function call. So, we used to show(), i.e., small case names to go further.

What is compiler error in C?

Compilation error refers to a state when a compiler fails to compile a piece of computer program source code, either due to errors in the code, or, more unusually, due to errors in the compiler itself. A compilation error message often helps programmers debugging the source code.


1 Answers

The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.

From the clang link above the rationale is explained as follows:

Stores to null and calls through null pointers are turned into a __builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the result of other transformations like inlining and constant propagation) and we used to just delete the blocks that contained them because they were "obviously unreachable".

While (from a pedantic language lawyer standpoint) this is strictly true, we quickly learned that people do occasionally dereference null pointers, and having the code execution just fall into the top of the next function makes it very difficult to understand the problem. From the performance angle, the most important aspect of exposing these is to squash downstream code. Because of this, clang turns these into a runtime trap: if one of these is actually dynamically reached, the program stops immediately and can be debugged. The drawback of doing this is that we slightly bloat code by having these operations and having the conditions that control their predicates.

at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.

As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.

like image 72
Shafik Yaghmour Avatar answered Oct 18 '22 07:10

Shafik Yaghmour