I wrote a simple C++ function in order to check compiler optimization:
bool f1(bool a, bool b) {
return !a || (a && b);
}
After that I checked the equivalent in Rust:
fn f1(a: bool, b: bool) -> bool {
!a || (a && b)
}
I used godbolt to check the assembler output.
The result of the C++ code (compiled by clang with -O3 flag) is following:
f1(bool, bool): # @f1(bool, bool)
xor dil, 1
or dil, sil
mov eax, edi
ret
And the result of Rust equivalent is much longer:
example::f1:
push rbp
mov rbp, rsp
mov al, sil
mov cl, dil
mov dl, cl
xor dl, -1
test dl, 1
mov byte ptr [rbp - 3], al
mov byte ptr [rbp - 4], cl
jne .LBB0_1
jmp .LBB0_3
.LBB0_1:
mov byte ptr [rbp - 2], 1
jmp .LBB0_4
.LBB0_2:
mov byte ptr [rbp - 2], 0
jmp .LBB0_4
.LBB0_3:
mov al, byte ptr [rbp - 4]
test al, 1
jne .LBB0_7
jmp .LBB0_6
.LBB0_4:
mov al, byte ptr [rbp - 2]
and al, 1
movzx eax, al
pop rbp
ret
.LBB0_5:
mov byte ptr [rbp - 1], 1
jmp .LBB0_8
.LBB0_6:
mov byte ptr [rbp - 1], 0
jmp .LBB0_8
.LBB0_7:
mov al, byte ptr [rbp - 3]
test al, 1
jne .LBB0_5
jmp .LBB0_6
.LBB0_8:
test byte ptr [rbp - 1], 1
jne .LBB0_1
jmp .LBB0_2
I also tried with -O
option but the output is empty (deleted unused function).
I intentionally am NOT using any library in order to keep output clean. Please notice that both clang
and rustc
use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc
?
GCC's PCH mechanism (which is just a dump of the compiler memory image) is related, but is architecturally only able to read the dump back into the exact same executable as the one that produced it (it is not a structured format). Clang is much faster and uses far less memory than GCC.
While Clang has historically been faster than GCC at compiling, the output quality has lagged behind. As of 2014, performance of Clang-compiled programs lagged behind performance of the GCC-compiled program, sometimes by large factors (up to 5.5x), replicating earlier reports of slower performance.
GCC supports more traditional languages than Clang and LLVM, such as Ada, Fortran, and Go. GCC supports more less-popular architectures, and supported RISC-V earlier than Clang and LLVM. GCC supports more language extensions and more assembly language features than Clang and LLVM.
clang and clang++ on most systems are the same executable. One is merely a symbolic link to the other. The program checks to see what name it is invoked under, and: for clang , compiles code as C.
Compiling with the compiler flag -O
(and with an added pub
), I get this output (Link to Godbolt):
push rbp
mov rbp, rsp
xor dil, 1
or dil, sil
mov eax, edi
pop rbp
ret
A few things:
Why is it still longer than the C++ version?
The Rust version is exactly three instructions longer:
push rbp
mov rbp, rsp
[...]
pop rbp
These are instructions to manage the so called frame pointer or base pointer (rbp
). This is mainly required to get nice stack traces. If you disable it for the C++ version via -fno-omit-frame-pointer
, you get the same result. Note that this uses g++
instead of clang++
since I haven't found a comparable option for the clang compiler.
Why doesn't Rust omit frame pointer?
Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"
, you get this output:
f1:
xor dil, 1
or dil, sil
mov eax, edi
ret
Which is exactly the output of your C++ version.
You can "undo" what Godbolt does by passing -C debuginfo=0
to the compiler.
Why -O
instead of --release
?
Godbolt uses rustc
directly instead of cargo
. The --release
flag is a flag for cargo
. To enable optimizations on rustc
, you need to pass -O
or -C opt-level=3
(or any other level between 0 and 3).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With