Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Undefined behaviour or gcc optimization bug

The question is did we introduce undefined behaviour that trips optimizer, or can we file a bug report against gcc?

Sorry for lack of better title, but it's quite fragile and we are almost sure that this is a bug. The minimal example is not our favourite design but it's based on production code that crashed:

#include <iostream>

struct Node
{
    Node(Node* parent) {
        if(parent) {
           parent->child_ = this;
        }
    }
    Node* child()
    {
        return child_;
    }
    Node* child_ = nullptr;
};

void walk(Node* module, int cleanup) {
    if(module != nullptr) {
        if(!cleanup) {
            std::cerr << "No cleanup";
        }
        walk(module->child(), cleanup);
        if(cleanup) {
            delete module;
        }
    }
}

int main (){
    Node* top = new Node(nullptr);
    Node* child = new Node(top);
    walk(top,1);
}

Compiled with -O1 -foptimize-sibling-calls -ftree-vrp. Godbolt example: https://gcc.godbolt.org/z/4VijKb

Program crashes calling module->child() when module is 0x0. Inspecting assembler we noticed that if (module != nullptr) is skipped at the beginning of walk. There is a check for cleanup and call to work seems to be unconditional, which results in trying to pull child_ from an invalid pointer.

The check is reestablished in assembly (and code seems to work) if:

  1. Any of the two optimizations over -O1 is taken away.
  2. Body of if(!cleanup) is removed. (No side effect from cerr)
  3. Body of if(cleanup) is removed. (Memory leak, but I think it counts as observable behaviour change)
  4. walk is called before "No cleanup" if. (Operation order)
  5. cleanup type is changed to bool from an int. (Type change - but no observable behaviour change, I think).
  6. Insertion of unconditional cerr << "text"; before and if(!cleanup). (Also an observable change.)

It seems like a weird combination of tail-recursion and nullptr check removal that resulted in wrong code. Possibly walk got split into sibling functions based on cleanup checks and got stitched wrongly(?).

Two candidates for UB were:

  1. Hinting compiler that module is non-nullptr, but I don't see a way compiler could infer the result.
  2. Using int in bool context, but it's legal AFAIK.

FWIW clang seems to produce correct run-time, gcc 8.3 also has the assembly for the check present. 9.1 and trunk not. We don't have any gcc expert at hand, so we had no idea why optimizer could be misled.

like image 939
luk32 Avatar asked Jun 18 '19 18:06

luk32


1 Answers

It does look like a GCC bug. I've been staring at this code for a while, and I can't find anything wrong with it whatsoever.

This is also reproducible with gcc, not just g++. It might be easier for the GCC developers to investigate if you write the minimal version of this in C. This C code reproduces the issue for me on GCC 9.1.0 with -O1 -foptimize-sibling-calls -ftree-vrp:

#include <stdio.h>
#include <stdlib.h>

struct Node
{
    struct Node* child;
};

void walk(struct Node* module, int cleanup)
{
    if (module == NULL) {
        return;
    }
    if (!cleanup) {
        puts("No cleanup");
    }
    walk(module->child, cleanup);
    if (cleanup) {
        free(module);
    }
}

int main()
{
    struct Node* node = malloc(sizeof(struct Node));
    node->child = NULL;
    walk(node, 1);
}
like image 191
Nikos C. Avatar answered Oct 09 '22 23:10

Nikos C.