Why does (infinite) recursion give different results with and w/o -O3 in clang (and gcc/g++)?

Tags:

When I compile and run this code

#include <stdio.h>

int collatz(int a) {
    return a == 1 ? 1 : collatz((a%2) ? 3*a+1 : a/2);
}

int main() {
    for (int a = 0; a < 10; ++a)
        printf("%d: %d\n", a, collatz(a));
}

in clang (10.0.0-4ubuntu1) it crashes, but when I add -O3 it runs fine. (This holds both when passing this as C or C++ code to clang.)

I understand that it crashes without the optimization flag, since the call of collatz with 0 will lead to an infinite recursion. However, with -O3 it returns 0: 1, which is wrong. Is that a (known) bug in the optimization or am I missing something?

(gcc/g++ (9.3.0) also crash without optimization, and the compilation hangs with -O3.)

Edit: Just to add that I am not looking for the reason why the program hangs / fails (that is clear) but why the compiler apparently has the liberty to return some value whilst the program should not return (but enter an infinite loop).

Edit 2: Why haven't I picked a more minimal example? Well, here it is, this gives the same behavior

int divide(int i) {
  return i == 1 ? 1 : divide(i/2);
}

819

asked Aug 30 '21 12:08

fuenfundachtzig

2 Answers

Is that a (known) bug in the optimization or am I missing something?

It's not a bug in optimisation. It's a bug in the program that is being compiled. The behaviour of the program is undefined in C++.

it returns 0: 1, which is wrong

There is no "wrong" behaviour when the behaviour of the program is undefined.

but the compilation hangs with -O3

This is probably a compiler bug.

how can the compiler just decide to return something, whereas, if you follow the program logic, it should not.

When the behaviour of the program is undefined, the compiler can decide to do anything or to do nothing. There's nothing in particular that it should do or should not do. An undefined program has no logic to it.

why the behaviour is undefined

The C++ standard says:

[intro.progress]

The implementation may assume that any thread will eventually do one of the following:

terminate,

make a call to a library I/O function,

perform an access through a volatile glvalue, or

perform a synchronization operation or an atomic operation.

In case the argument is 0, the thread would never do any of those things, and as such the [language] implementation is allowed to assume that the function is never called with the argument of 0. When you call the function with such argument, you contradict this assumption and behaviour is undefined.

101

answered Oct 19 '22 15:10

eerorika

Note: my answer is only true in C. As said in other answers in C++ the program has UB in it. This is one of the reason you shouldn't compile C code with a C++ compiler.

I tried to get the assembly generated by clang 10.0.0 through godbolt. I understand not everyone knows how to read assembly so I'll try to explain succintly what we can deduce from the listings. Here's what I generated:

With clang -S main.c:

collatz(int):
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
        str     r0, [r11, #-4]
        ldr     r0, [r11, #-4]
        cmp     r0, #1
        bne     .LBB0_2
        b       .LBB0_1
.LBB0_1:
        mov     r0, #1
        str     r0, [sp, #8]            @ 4-byte Spill
        b       .LBB0_6
.LBB0_2:
        ldr     r0, [r11, #-4]
        add     r1, r0, r0, lsr #31
        bic     r1, r1, #1
        sub     r0, r0, r1
        cmp     r0, #0
        beq     .LBB0_4
        b       .LBB0_3
.LBB0_3:
        ldr     r0, [r11, #-4]
        add     r0, r0, r0, lsl #1
        add     r0, r0, #1
        str     r0, [sp, #4]            @ 4-byte Spill
        b       .LBB0_5
.LBB0_4:
        ldr     r0, [r11, #-4]
        add     r0, r0, r0, lsr #31
        asr     r0, r0, #1
        str     r0, [sp, #4]            @ 4-byte Spill
        b       .LBB0_5
.LBB0_5:
        ldr     r0, [sp, #4]            @ 4-byte Reload
        bl      collatz(int)
        str     r0, [sp, #8]            @ 4-byte Spill
        b       .LBB0_6
.LBB0_6:
        ldr     r0, [sp, #8]            @ 4-byte Reload
        mov     sp, r11
        pop     {r11, lr}
        bx      lr
main:
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #16
        mov     r0, #0
        str     r0, [r11, #-4]
        str     r0, [sp, #8]
        b       .LBB1_1
.LBB1_1:                                @ =>This Inner Loop Header: Depth=1
        ldr     r0, [sp, #8]
        cmp     r0, #9
        bgt     .LBB1_4
        b       .LBB1_2
.LBB1_2:                                @   in Loop: Header=BB1_1 Depth=1
        ldr     r0, [sp, #8]
        str     r0, [sp, #4]            @ 4-byte Spill
        bl      collatz(int)
        ldr     r1, .LCPI1_0
        str     r0, [sp]                @ 4-byte Spill
        mov     r0, r1
        ldr     r1, [sp, #4]            @ 4-byte Reload
        ldr     r2, [sp]                @ 4-byte Reload
        bl      printf
        b       .LBB1_3
.LBB1_3:                                @   in Loop: Header=BB1_1 Depth=1
        ldr     r0, [sp, #8]
        add     r0, r0, #1
        str     r0, [sp, #8]
        b       .LBB1_1
.LBB1_4:
        ldr     r0, [r11, #-4]
        mov     sp, r11
        pop     {r11, lr}
        bx      lr
.LCPI1_0:
        .long   .L.str
.L.str:
        .asciz  "%d: %d\n"

Here, things look pretty normal, the function is defined, its behavior looks okay, the loop is formed correctly in the main so as we can expect, when it calls collatz(0), the stack overflows.

With clang -S -O3 main.c:

collatz(int):
        mov     r0, #1
        bx      lr
main:
        push    {r4, r10, r11, lr}
        add     r11, sp, #8
        ldr     r4, .LCPI1_0
        mov     r1, #0
        mov     r2, #1
        mov     r0, r4
        bl      printf
        mov     r0, r4
        mov     r1, #1
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #2
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #3
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #4
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #5
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #6
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #7
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #8
        mov     r2, #1
        bl      printf
        mov     r0, r4
        mov     r1, #9
        mov     r2, #1
        bl      printf
        mov     r0, #0
        pop     {r4, r10, r11, lr}
        bx      lr
.LCPI1_0:
        .long   .L.str
.L.str:
        .asciz  "%d: %d\n"

clang is completely drunk here. It did optimize the loop as we would expect but it decided that collatz(0) = 1 for whatever reason and it compiled a very weird version of collatz only to never call it. You may say clang is a C++ compiler so it makes sense for it to do unexpected things but that's not true, clang advertise itself as "the Clang C, C++, and Objective-C compiler". To me, it's a compiler bug. Let's try with clang 12.0.1 just in case the bug was solved in a later version.

With the latest version clang -S -O3 main.c:

    .text
    .file   "main.c"
    .globl  collatz                         # -- Begin function collatz
    .p2align    4, 0x90
    .type   collatz,@function
collatz:                                # @collatz
    .cfi_startproc
# %bb.0:
                                        # kill: def $edi killed $edi def $rdi
    cmpl    $1, %edi
    jne .LBB0_2
    jmp .LBB0_5
    .p2align    4, 0x90
.LBB0_3:                                #   in Loop: Header=BB0_2 Depth=1
    leal    (%rdi,%rdi,2), %edi
    addl    $1, %edi
    cmpl    $1, %edi
    je  .LBB0_5
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
    testb   $1, %dil
    jne .LBB0_3
# %bb.4:                                #   in Loop: Header=BB0_2 Depth=1
    movl    %edi, %eax
    shrl    $31, %eax
    addl    %edi, %eax
    sarl    %eax
    movl    %eax, %edi
    cmpl    $1, %edi
    jne .LBB0_2
.LBB0_5:
    movl    $1, %eax
    retq
.Lfunc_end0:
    .size   collatz, .Lfunc_end0-collatz
    .cfi_endproc
                                        # -- End function
    .globl  main                            # -- Begin function main
    .p2align    4, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# %bb.0:
    .p2align    4, 0x90
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
    jmp .LBB1_1
.Lfunc_end1:
    .size   main, .Lfunc_end1-main
    .cfi_endproc
                                        # -- End function
    .ident  "clang version 12.0.1"
    .section    ".note.GNU-stack","",@progbits
    .addrsig

Here, the main function consists of a while(1) loop, which is what we expect given the input.

To conclude, I think both gcc and clang 10.0.0 have a bug, though one is easier to see as a bug than the other

answered Oct 19 '22 14:10

Tzig

Related questions
                            
                                Can this member function selection code be written without std::invoke?
                            
                                Does the force kill command (kill -9) in linux cleanup the dynamically allocated memory with new operator in C++ application?
                            
                                Can I specialize forward declared template?
                            
                                More than one operator overload in Rust
                            
                                What kind of value does a pointer hold after using it to explicitly call the pointed object's destructor?
                            
                                Object rvalue propagation for member function calls
                            
                                Why is the symbol of member function weak?
                            
                                C++ syntax question - Why can't I use comma to separate variable definition of different types
                            
                                Multiply-add vectorization slower with AVX than with SSE
                            
                                Forcing inlining of lambda in MSVC C++
                            
                                Possibly learning old C++ standard
                            
                                Why do printf and isnan disagree whether a long double value is a NaN?
                            
                                Why are time_of_day and hh_mm_ss two different types in C++20?
                            
                                Why does views::reverse not work with iota_view<int64_t, int64_t>
                            
                                Overload resolution between va_list and ellipsis differs based on the literal value. Why?
                            
                                What is the purpose of (void) casts in the GCC standard library implementation?
                            
                                Raise compile-time error if a string has whitespace
                            
                                Creating a base case for Variadic Template recursion with no template arguments
                            
                                c++ list not resizing after set difference
                            
                                What is a nice way to cycle through an enum?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does (infinite) recursion give different results with and w/o -O3 in clang (and gcc/g++)?

Tags:

c++

c

compiler-optimization

clang

fuenfundachtzig

People also ask

2 Answers

eerorika

Tzig

Recent Activity

Donate For Us