clang 8.0.0 introduces support for the char8_t
type from c++20. However, I would expect the following functions to have the same compiler output
#include <algorithm>
bool compare4(char const* pcha, char const* pchB, int n) {
return std::equal(pcha, pcha+4, pchB);
}
bool compare4(char8_t const* pchA, char8_t const* pchB, int n) {
return std::equal(pchA, pchA+4, pchB);
}
However, they compile under -std=c++2a -O2
to
compare4(char const*, char const*, int): # @compare4(char const*, char const*, int)
mov eax, dword ptr [rdi]
cmp eax, dword ptr [rsi]
sete al
ret
_Z8compare4PKDuS0_i: # @_Z8compare4PKDuS0_i
mov al, byte ptr [rdi]
cmp al, byte ptr [rsi]
jne .LBB1_4
mov al, byte ptr [rdi + 1]
cmp al, byte ptr [rsi + 1]
jne .LBB1_4
mov al, byte ptr [rdi + 2]
cmp al, byte ptr [rsi + 2]
jne .LBB1_4
mov al, byte ptr [rdi + 3]
cmp al, byte ptr [rsi + 3]
sete al
ret
.LBB1_4:
xor eax, eax
ret
in which the latter is clearly less optimized. Is there a reason for this (I couldn't find any in the standard) or is this a bug in clang?
Sometimes a program is a lot faster when compiled with GCC, sometimes it's a lot faster with clang. Usually it's marginally faster with GCC. Clang attempts to unroll loops really, really aggressively. Even at -O2 : Clang's loop unrolling attempts at -O2 are more aggressive than GCC's loop unrolling attempts at -O3 .
(GNU C is a language, GCC is a compiler for that language.Clang defines __GNUC__ / __GNUC_MINOR__ / __GNUC_PATCHLEVEL__ according to the version of gcc that it claims full compatibility with.
Here Clang is the frontend and LLVM is the backend. LLVM defines a common intermediate representation (IR) based on the single static assignment (SSA) form. This makes many optimizations to be easily performed on the IR.
clang -cc1 is the frontend, clang is the driver. The driver invokes the frontend with options appropriate for your system. To see these options, run: $ clang -### -c hello.c. Some clang command line options are driver-only options, some are frontend-only options.
In libstdc++, std::equal
calls __builtin_memcmp
when it detects that the arguments are "simple", otherwise it uses a naive for loop. "Simple" here means pointers (or certain iterator wrappers around pointer) to the same integer or pointer type.(relevant source code)
__is_integer
trait, but libstdc++ 8.2.0 (the version used on godbolt.org) does not specialize this trait for char8_t
, so the latter is not detected as an integer type.(relevant source code)Clang (with this particular configuration) generates more verbose assembly in the for loop case than in the __builtin_memcmp
case. (But the former is not necessarily less optimized in terms of performance. See Loop_unrolling.)
So there's a reason for this difference, and it's not a bug in clang IMO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With