Please consider following code:
float test(int len, int* tab)
{
for(int i = 0; i<len; i++)
tab[i] = i;
}
Obviously return is missing. For this scenario for both clang and ndk compiler for ARM processor an infinite loop is generated. After disassembling it becomes clear that compiler generates regular branch instruction instead of conditional branch.
mov r0, #0
.LBB0_1:
str r0, [r1, r0, lsl #2]
add r0, r0, #1
b .LBB0_1
The example with an error can be found here: https://godbolt.org/z/YDSFw-
Please note that c++ specification states that missing return is considered as undefined behaviour but it refers only to the returned value. It shall not affect the preceding instructions.
Am I missing something here? Any thoughts?
No, you can't reason that way with undefined behaviour.
The compiler is free to use undefined behaviour and assumptions around it for optimizations. The compiler is free to assume your code will not contain undefined behaviour.
In this case, the compiler can assume that the code with undefined behaviour won't be reached. As the end of the function contains undefined behaviour, the compiler concludes that the end of the function actually never will be reached, and thus can optimize the loop.
If you remove the -Oz
and add -emit-llvm
to the compiler explorer command, you'll see what LLVM IR clang produces originally, when not doing optimizations:
https://godbolt.org/z/-dbeNj
define dso_local float @_Z4testiPi(i32 %0, i32* %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32*, align 4
%5 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32* %1, i32** %4, align 4
store i32 0, i32* %5, align 4
br label %6
6: ; preds = %15, %2
%7 = load i32, i32* %5, align 4
%8 = load i32, i32* %3, align 4
%9 = icmp slt i32 %7, %8
br i1 %9, label %10, label %18
10: ; preds = %6
%11 = load i32, i32* %5, align 4
%12 = load i32*, i32** %4, align 4
%13 = load i32, i32* %5, align 4
%14 = getelementptr inbounds i32, i32* %12, i32 %13
store i32 %11, i32* %14, align 4
br label %15
15: ; preds = %10
%16 = load i32, i32* %5, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %5, align 4
br label %6
18: ; preds = %6
call void @llvm.trap()
unreachable
}
The end of the loop, label 18, contains unreachable
. This can be used for further optimizations, getting rid of the branch and comparison at the start of the loop.
Edit: There's an excellent blog post from John Regehr about how to reason around undefined behaviour in C and C++. It's a bit long but well worth a read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With