Function not called in code gets called at runtime

Question

How can the following program be calling format_disk if it's never called in code?

#include <cstdio>

static void format_disk()
{
  std::puts("formatting hard disk drive!");
}

static void (*foo)() = nullptr;

void never_called()
{
  foo = format_disk;
}

int main()
{
  foo();
}

This differs from compiler to compiler. Compiling with Clang with optimizations on, the function never_called executes at runtime.

$ clang++ -std=c++17 -O3 a.cpp && ./a.out
formatting hard disk drive!

Compiling with GCC, however, this code just crashes:

$ g++ -std=c++17 -O3 a.cpp && ./a.out
Segmentation fault (core dumped)

Compilers version:

$ clang --version
clang version 5.0.0 (tags/RELEASE_500/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ gcc --version
gcc (GCC) 7.2.1 20171128
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Mário Feroldi · Accepted Answer

The program contains undefined behavior, as dereferencing a null pointer (i.e. calling foo() in main without assigning a valid address to it beforehand) is UB, therefore no requirements are imposed by the standard.

Executing format_disk at runtime is a perfect valid situation when undefined behavior has been hit, it's as valid as just crashing (like when compiled with GCC). Okay, but why is Clang doing that? If you compile it with optimizations off, the program will no longer output "formatting hard disk drive", and will just crash:

$ clang++ -std=c++17 -O0 a.cpp && ./a.out
Segmentation fault (core dumped)

The generated code for this version is as follows:

main:                                   # @main
        push    rbp
        mov     rbp, rsp
        call    qword ptr [foo]
        xor     eax, eax
        pop     rbp
        ret

It tries to make a call to a function to which foo points, and as foo is initialized with nullptr (or if it didn't have any initialization, this would still be the case), its value is zero. Here, undefined behavior has been hit, so anything can happen at all and the program is rendered useless. Normally, making a call to such an invalid address results in segmentation fault errors, hence the message we get when executing the program.

Now let's examine the same program but compiling it with optimizations on:

$ clang++ -std=c++17 -O3 a.cpp && ./a.out
formatting hard disk drive!

The generated code for this version is as follows:

never_called():                         # @never_called()
        ret
main:                                   # @main
        push    rax
        mov     edi, .L.str
        call    puts
        xor     eax, eax
        pop     rcx
        ret
.L.str:
        .asciz  "formatting hard disk drive!"

Interestingly, somehow optimizations modified the program so that main calls std::puts directly. But why did Clang do that? And why is never_called compiled to a single ret instruction?

Let's get back to the standard (N4660, specifically) for a moment. What does it say about undefined behavior?

3.27 undefined behavior [defns.undefined]

behavior for which this document imposes no requirements

[Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression never exhibits behavior explicitly specified as undefined ([expr.const]). — end note]

Emphasis mine.

A program that exhibits undefined behavior becomes useless, as everything it has done so far and will do further has no meaning if it contains erroneous data or constructs. With that in mind, do remember that compilers may completely ignore for the case when undefined behavior is hit, and this actually is used as discovered facts when optimizing a program. For instance, a construct like x + 1 > x (where x is a signed integer) will be optimized away to a constant, true, even if the value of x is unknown at compile-time. The reasoning is that the compiler wants to optimize for valid cases, and the only way for that construct to be valid is when it doesn't trigger arithmetic overflow (i.e. if x != std::numeric_limits<decltype(x)>::max()). This is a new learned fact in the optimizer. Based on that, the construct is proven to always evaluate to true.

Note: this same optimization can't occur for unsigned integers, because overflowing one is not UB. That is, the compiler needs to keep the expression as it is, as it might have a different evaluation when overflow occurs (unsigned is module 2^N, where N is number of bits). Optimizing it away for unsigned integers would be incompliant with the standard (thanks aschepler).

This is useful as it allows for tons of optimizations to kick in. So far, so good, but what happens if x holds its maximum value at runtime? Well, that is undefined behavior, so it's nonsense to try to reason about it, as anything may happen and the standard imposes no requirements.

Now we have enough information in order to better examine your faulty program. We already know that accessing a null pointer is undefined behavior, and that's what's causing the funny behavior at runtime. So let's try and understand why Clang (or technically LLVM) optimized the program the way it did.

static void (*foo)() = nullptr;

static void format_disk()
{
  std::puts("formatting hard disk drive!");
}

void never_called()
{
  foo = format_disk;
}

int main()
{
  foo();
}

Remember that it's possible to call never_called before the main entry starts executing. For example, when declaring a top-level variable, you can call it while initializing the value of that variable:

void never_called();
int x = (never_called(), 42);

If you write this snippet in your program, the program no longer exhibits undefined behavior, and the message "formatting hard disk drive!" is displayed, with optimizations either on or off.

So what's the only way this program is valid? There's this never_caled function that assigns the address of format_disk to foo, so we might find something here. Note that foo is marked as static, which means it has internal linkage and can't be accessed from outside this translation unit. In contrast, the function never_called has external linkage, and may be accessed from outside. If another translation unit contains a snippet like the one above, then this program becomes valid.

Cool, but there's no one calling never_called from outside. Even though this is the fact, the optimizer sees that the only way for this program to be valid is if never_called is called before main executes, otherwise it's just undefined behavior. That's a new learned fact, so the compiler assumes never_called is in fact called. Based on that new knowledge, other optimizations that kick in may take advantage of it.

For instance, when constant folding is applied, it sees that the construct foo() is only valid if foo can be properly initialized. The only way for that to happen is if never_called is called outside of this translation unit, so foo = format_disk.

Dead code elimination and interprocedural optimization might find out that if foo == format_disk, then the code inside never_called is unneeded, so the function's body is transformed into a single ret instruction.

Inline expansion optimization sees that foo == format_disk, so the call to foo can be replaced with its body. In the end, we end up with something like this:

never_called():
        ret
main:
        mov     edi, .L.str
        call    puts
        xor     eax, eax
        ret
.L.str:
        .asciz  "formatting hard disk drive!"

Which is somewhat equivalent to the output of Clang with optimizations on. Of course, what Clang really did may (and might) be different, but optimizations are nonetheless capable of reaching the same conclusion.

Examining GCC's output with optimizations on, it seems it didn't bother investigating:

.LC0:
        .string "formatting hard disk drive!"
format_disk():
        mov     edi, OFFSET FLAT:.LC0
        jmp     puts
never_called():
        mov     QWORD PTR foo[rip], OFFSET FLAT:format_disk()
        ret
main:
        sub     rsp, 8
        call    [QWORD PTR foo[rip]]
        xor     eax, eax
        add     rsp, 8
        ret

Executing that program results in a crash (segmentation fault), but if you call never_called in another translation unit before main gets executed, then this program doesn't exhibit undefined behavior anymore.

All of this can change crazily as more and more optimizations are engineered, so do not rely on the assumption that your compiler will take care of code containing undefined behavior, it might just screw you up as well (and format your hard drive for real!)

I recommend you read What every C programmer should know about Undefined Behavior and A Guide to Undefined Behavior in C and C++, both article series are very informative and might help you out with understanding the state of art.

Function not called in code gets called at runtime

Tags:

c++

compiler-optimization

undefined-behavior

g++

clang++

Mário Feroldi

1 Answers

Mário Feroldi

Recent Activity

Donate For Us

Function not called in code gets called at runtime

Tags:

c++

compiler-optimization

undefined-behavior

g++

clang++

Mário Feroldi

1 Answers

Mário Feroldi

Related questions

Recent Activity

Donate For Us