Is there a flaw in how clang implements char8_t or does some dark corner of the standard prohibit optimization?

clang 8.0.0 introduces support for the char8_t type from c++20. However, I would expect the following functions to have the same compiler output

#include <algorithm>

bool compare4(char const* pcha, char const* pchB, int n) {
    return std::equal(pcha, pcha+4, pchB);
}

bool compare4(char8_t const* pchA, char8_t const* pchB, int n) {
    return std::equal(pchA, pchA+4, pchB);
}

However, they compile under -std=c++2a -O2 to

compare4(char const*, char const*, int):   # @compare4(char const*, char const*, int)
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsi]
        sete    al
        ret
_Z8compare4PKDuS0_i:                       # @_Z8compare4PKDuS0_i
        mov     al, byte ptr [rdi]
        cmp     al, byte ptr [rsi]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 1]
        cmp     al, byte ptr [rsi + 1]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 2]
        cmp     al, byte ptr [rsi + 2]
        jne     .LBB1_4
        mov     al, byte ptr [rdi + 3]
        cmp     al, byte ptr [rsi + 3]
        sete    al
        ret
.LBB1_4:
        xor     eax, eax
        ret

in which the latter is clearly less optimized. Is there a reason for this (I couldn't find any in the standard) or is this a bug in clang?

Does Clang optimize better than GCC?

Sometimes a program is a lot faster when compiled with GCC, sometimes it's a lot faster with clang. Usually it's marginally faster with GCC. Clang attempts to unroll loops really, really aggressively. Even at -O2 : Clang's loop unrolling attempts at -O2 are more aggressive than GCC's loop unrolling attempts at -O3 .

Does Clang define __ GNUC __?

(GNU C is a language, GCC is a compiler for that language.Clang defines __GNUC__ / __GNUC_MINOR__ / __GNUC_PATCHLEVEL__ according to the version of gcc that it claims full compatibility with.

What is the difference between Clang and LLVM?

Here Clang is the frontend and LLVM is the backend. LLVM defines a common intermediate representation (IR) based on the single static assignment (SSA) form. This makes many optimizations to be easily performed on the IR.

What is cc1 Clang?

clang -cc1 is the frontend, clang is the driver. The driver invokes the frontend with options appropriate for your system. To see these options, run: $ clang -### -c hello.c. Some clang command line options are driver-only options, some are frontend-only options.

In libstdc++, std::equal calls __builtin_memcmp when it detects that the arguments are "simple", otherwise it uses a naive for loop. "Simple" here means pointers (or certain iterator wrappers around pointer) to the same integer or pointer type.(relevant source code)
- Whether a type is an integer type is detected by the internal __is_integer trait, but libstdc++ 8.2.0 (the version used on godbolt.org) does not specialize this trait for char8_t, so the latter is not detected as an integer type.(relevant source code)
Clang (with this particular configuration) generates more verbose assembly in the for loop case than in the __builtin_memcmp case. ~~(But the former is not necessarily less optimized in terms of performance. See Loop_unrolling.)~~

So there's a reason for this difference, and it's not a bug in clang IMO.

Is there a flaw in how clang implements char8_t or does some dark corner of the standard prohibit optimization?

Tags:

c++

compiler-optimization

clang

c++20

Tobi

People also ask

1 Answers

cpplearner

Recent Activity

Donate For Us

Is there a flaw in how clang implements char8_t or does some dark corner of the standard prohibit optimization?

Tags:

c++

compiler-optimization

clang

c++20

Tobi

People also ask

1 Answers

cpplearner

Related questions

Recent Activity

Donate For Us