Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiler error in ndk and clang++ for ARM?

Please consider following code:

float test(int len, int* tab)
{
    for(int i = 0; i<len; i++)
        tab[i] = i;
}

Obviously return is missing. For this scenario for both clang and ndk compiler for ARM processor an infinite loop is generated. After disassembling it becomes clear that compiler generates regular branch instruction instead of conditional branch.

    mov     r0, #0
.LBB0_1:
    str     r0, [r1, r0, lsl #2]
    add     r0, r0, #1
    b       .LBB0_1

The example with an error can be found here: https://godbolt.org/z/YDSFw-

Please note that c++ specification states that missing return is considered as undefined behaviour but it refers only to the returned value. It shall not affect the preceding instructions.

Am I missing something here? Any thoughts?

like image 844
no one special Avatar asked Nov 26 '19 10:11

no one special


Video Answer


1 Answers

No, you can't reason that way with undefined behaviour.

The compiler is free to use undefined behaviour and assumptions around it for optimizations. The compiler is free to assume your code will not contain undefined behaviour.

In this case, the compiler can assume that the code with undefined behaviour won't be reached. As the end of the function contains undefined behaviour, the compiler concludes that the end of the function actually never will be reached, and thus can optimize the loop.

If you remove the -Oz and add -emit-llvm to the compiler explorer command, you'll see what LLVM IR clang produces originally, when not doing optimizations: https://godbolt.org/z/-dbeNj

define dso_local float @_Z4testiPi(i32 %0, i32* %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32*, align 4
  %5 = alloca i32, align 4
  store i32 %0, i32* %3, align 4
  store i32* %1, i32** %4, align 4
  store i32 0, i32* %5, align 4
  br label %6

6:                                                ; preds = %15, %2
  %7 = load i32, i32* %5, align 4
  %8 = load i32, i32* %3, align 4
  %9 = icmp slt i32 %7, %8
  br i1 %9, label %10, label %18

10:                                               ; preds = %6
  %11 = load i32, i32* %5, align 4
  %12 = load i32*, i32** %4, align 4
  %13 = load i32, i32* %5, align 4
  %14 = getelementptr inbounds i32, i32* %12, i32 %13
  store i32 %11, i32* %14, align 4
  br label %15

15:                                               ; preds = %10
  %16 = load i32, i32* %5, align 4
  %17 = add nsw i32 %16, 1
  store i32 %17, i32* %5, align 4
  br label %6

18:                                               ; preds = %6
  call void @llvm.trap()
  unreachable
}

The end of the loop, label 18, contains unreachable. This can be used for further optimizations, getting rid of the branch and comparison at the start of the loop.

Edit: There's an excellent blog post from John Regehr about how to reason around undefined behaviour in C and C++. It's a bit long but well worth a read.

like image 110
mstorsjo Avatar answered Oct 19 '22 19:10

mstorsjo