Consider the following code:
#include <string_view>
constexpr std::string_view f() { return "hello"; }
static constexpr std::string_view g() {
auto x = f();
return x.substr(1, 3);
}
int foo() { return g().length(); }
If I compile it with GCC 10.2, and flags --std=c++17 -O1
, I get:
foo():
mov eax, 3
ret
also, to my knowledge, this code does not suffer from any undefined behavior issues.
However - if I add the flag -fsanitize=undefined
, the compilation result is:
.LC0:
.string "hello"
foo():
sub rsp, 104
mov QWORD PTR [rsp+80], 5
mov QWORD PTR [rsp+16], 5
mov QWORD PTR [rsp+24], OFFSET FLAT:.LC0
mov QWORD PTR [rsp+8], 3
mov QWORD PTR [rsp+72], 4
mov eax, OFFSET FLAT:.LC0
cmp rax, -1
jnb .L4
.L2:
mov eax, 3
add rsp, 104
ret
.L4:
mov edx, OFFSET FLAT:.LC0+1
mov rsi, rax
mov edi, OFFSET FLAT:.Lubsan_data154
call __ubsan_handle_pointer_overflow
jmp .L2
.LC1:
.string "/opt/compiler-explorer/gcc-10.2.0/include/c++/10.2.0/string_view"
.Lubsan_data154:
.quad .LC1
.long 287
.long 49
See this on Compiler Explorer.
My question: Why should the sanitization interfere with the optimization? Especially since the code doesn't seem to have any UB hazards...
Notes:
-O3
.x
to be a constexpr
variable, the sanitization doesn't prevent the optimization.-O3
).UBSAN is a runtime undefined behaviour checker. UBSAN uses compile-time instrumentation to catch undefined behavior (UB). Compiler inserts code that perform certain kinds of checks before operations that may cause UB. If check fails (i.e. UB detected) __ubsan_handle_* function called to print error message.
UndefinedBehaviorSanitizer (UBSan) is a fast undefined behavior detector. UBSan modifies the program at compile-time to catch various kinds of undefined behavior during program execution, for example: Array subscript out of bounds, where the bounds can be statically determined.
However, it is possible to determine whether a specific execution of a C++ produced undefined behavior. One way to do this would be to make a C++ interpreter that steps through the code according to the definitions set out in the spec, at each point determining whether or not the code has undefined behavior.
Sanitizers add necessary instrumentation to detect violations at run-time. That instrumentation may prevent the function from being computed at compile-time as an optimization by introducing some opaque calls/side-effects that wouldn't be present there otherwise.
The inconsistent behavior you see is because g().length();
call is not done in constexpr
context, so it's not required (well, "not expected" would be more accurate) to be computed at compile-time. GCC likely has some heuristics to compute constexpr
functions with constexpr
arguments in regular contexts that don't trigger once sanitizers get involved by either breaking the constexpr
-ness of the function (due to added instrumentation) or one of the heuristics involved.
Adding constexpr
to x
makes f()
call a constant expression (even if g()
is not), so it's compiled at compile-time so it doesn't need to be instrumented, which is enough for other optimizations to trigger.
One can view that as a QoI issue, but in general it makes sense as
constexpr
function evaluation can take arbitrarily long, so it's not always preferable to evaluate everything at compile time unless asked toIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With