The question is did we introduce undefined behaviour that trips optimizer, or can we file a bug report against gcc?
Sorry for lack of better title, but it's quite fragile and we are almost sure that this is a bug. The minimal example is not our favourite design but it's based on production code that crashed:
#include <iostream>
struct Node
{
Node(Node* parent) {
if(parent) {
parent->child_ = this;
}
}
Node* child()
{
return child_;
}
Node* child_ = nullptr;
};
void walk(Node* module, int cleanup) {
if(module != nullptr) {
if(!cleanup) {
std::cerr << "No cleanup";
}
walk(module->child(), cleanup);
if(cleanup) {
delete module;
}
}
}
int main (){
Node* top = new Node(nullptr);
Node* child = new Node(top);
walk(top,1);
}
Compiled with -O1 -foptimize-sibling-calls -ftree-vrp
. Godbolt example: https://gcc.godbolt.org/z/4VijKb
Program crashes calling module->child()
when module is 0x0
. Inspecting assembler we noticed that if (module != nullptr)
is skipped at the beginning of walk
. There is a check for cleanup
and call to work
seems to be unconditional, which results in trying to pull child_
from an invalid pointer.
The check is reestablished in assembly (and code seems to work) if:
-O1
is taken away.if(!cleanup)
is removed. (No side effect from cerr
)if(cleanup)
is removed. (Memory leak, but I think it counts as observable behaviour change)walk
is called before "No cleanup" if
. (Operation order)cleanup
type is changed to bool
from an int
. (Type change - but no observable behaviour change, I think).cerr << "text";
before and if(!cleanup)
. (Also an observable change.)It seems like a weird combination of tail-recursion and nullptr
check removal that resulted in wrong code. Possibly walk
got split into sibling functions based on cleanup
checks and got stitched wrongly(?).
Two candidates for UB were:
module
is non-nullptr
, but I don't see a way compiler could infer the result.int
in bool
context, but it's legal AFAIK.FWIW clang
seems to produce correct run-time, gcc 8.3
also has the assembly for the check present. 9.1
and trunk
not. We don't have any gcc expert at hand, so we had no idea why optimizer could be misled.
It does look like a GCC bug. I've been staring at this code for a while, and I can't find anything wrong with it whatsoever.
This is also reproducible with gcc
, not just g++
. It might be easier for the GCC developers to investigate if you write the minimal version of this in C. This C code reproduces the issue for me on GCC 9.1.0 with -O1 -foptimize-sibling-calls -ftree-vrp
:
#include <stdio.h>
#include <stdlib.h>
struct Node
{
struct Node* child;
};
void walk(struct Node* module, int cleanup)
{
if (module == NULL) {
return;
}
if (!cleanup) {
puts("No cleanup");
}
walk(module->child, cleanup);
if (cleanup) {
free(module);
}
}
int main()
{
struct Node* node = malloc(sizeof(struct Node));
node->child = NULL;
walk(node, 1);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With