Firstly a trivial mathematical fact: given integers n
and m
, we have n < m
if, and only if, n <= m - 1
.
GCC seems to prefer immediate values of smaller absolute value. Hence, when m
is known and other conditions are met, the compiler chooses among equivalent comparison expressions the one minimizing absolute values. For instance, it prefers n <= 1000
over n < 1001
and GCC 9.2 translates this
bool f(uint32_t n) {
return n < 1001;
}
into this x86
assembly code
f(unsigned int):
cmpl $1000, %edi
setbe %al
ret
There might be good performance reasons for that but that's not my question. What I'd like to know is this: Is there a way to force GCC to keep the original comparison? More specifically, I'm not worried about portability and thus, GCC specifics (options, pragmas, attributes, ...) are OK for me. However, I'm looking for a constexpr
friendly solution which seems to exclude inline asm
. Finally, I'm targeting C++17 which excludes things like std::is_constant_evaluated
. (Having said that, please, fell free to provide answers regardless of my constraints because it might still be useful for others.)
You might ask why I want to do such thing. Here we go. To my understanding (please, correct me if I'm wrong) this behavior might be a "pessimization" for x86_64
in the following example:
bool g(uint64_t n) {
n *= 5000000001;
return n < 5000000001;
}
which is translated by GCC 6.2 into
g(unsigned long):
movabsq $5000000001, %rax
imulq %rax, %rdi
movabsq $5000000000, %rax
cmpq %rax, %rdi
setbe %al
ret
In x86_64
, computations with 64-bits immediate values have some restrictions that might imply these values to be loaded into registers. In the above example, this happens twice: constants 5000000001
and 5000000000
are stored in rax
for the multiplication and the comparison. Had GCC kept the original comparison as it appears in the C++ code (i.e., against 5000000001
) there would be no need for the second movabs
.
This also implies a code size penalty which, I guess, was considered an issue and more recent versions of GCC (e.g. 9.2) produce this:
g(unsigned long):
movabsq $5000000001, %rax
imulq %rax, %rdi
subq $1, %rax
cmpq %rax, %rdi
setbe %al
ret
Hence the 10-bytes long movabs
was replaced by a 4-bytes long subq
instruction. In any case, subq
also seems unnecessary.
In general the only way to force GCC to output a particular instruction sequence that you might want is to use assembly. If GCC isn't generating optimal code, the best thing to do is to submit a bug report so hopefully it'll be improved in future versions.
For your specific case there's at least one work around that can generate the code you want, at least for current compilers. While it doesn't require writing the entire code sequence in assembly, it does use inline assembly to load the constant into a register on the assumption that GCC will prefer using this register rather than generating a new constant value for the comparison.
#include <stdint.h>
static uint64_t
load_const_asm(uint64_t c) {
if (__builtin_constant_p(c) && (int32_t) c != c) {
asm("" : "+r" (c));
}
return c;
}
bool
g(uint64_t n) {
uint64_t c = load_const_asm(5000000001);
n *= c;
return n < c;
}
which generates the following code when compiled with -O
on GCC 9.2:
_Z1gm:
movabs rax, 5000000001
imul rdi, rax
cmp rdi, rax
setb al
ret
This asm statement tricks the compiler in not generating a separate load of the constant minus one because the compiler doesn't know what asm statement does, even though its just an empty string. It forces the constant into a register, but the compiler has no idea what the register will contain after the asm statement "executes".
The advantage of doing it this way, over using an asm statement with a CMP instruction, is that it gives the compiler the most freedom optimize your code. This is a big disadvantage of inline assembly generally, it's easy for it to end up pessimizing your code. For example, using inline assembly prevents the compiler from calculating the result of g() at compile time if n
is a constant. However with my code example above, it's still able to figure out that g() must return true if n
is 1.
Finally, make sure that you're not trying to prematurely optimize your code. You don't want to be obfuscating your code like this if it's not going to make any noticeable different to performance your application. If you do end up using this code, or some other hack, then as Peter Cordes said in a comment, you should clearly comment your code to explain why the hack is necessary so it can be removed when it's no longer needed.
Here is a C++20 solution which, sadly, I cannot use
#include <cstdint>
#include <type_traits>
template <class U>
bool less_asm(U n, U m) noexcept {
bool r;
asm("cmp%z[m]\t{%[m], %[n]|%[n], %[m]}"
: "=@ccb"(r) : [n]"r"(n), [m]"re"(m) : "cc");
return r;
}
template <class U>
constexpr bool less(U n, U m) noexcept {
if (std::is_constant_evaluated())
return n < m;
return less_asm(n, m);
}
static_assert(less(uint64_t(0), uint64_t(1)));
bool g(uint64_t n) {
n *= 5000000001;
return less<uint64_t>(n, 5000000001);
}
GCC 9.2 (with -O2 -std=c++2a) generates this:
g(unsigned long):
movabsq $5000000001, %rax
imulq %rax, %rdi
cmpq %rax, %rdi
setc %al
ret
Update: The snippet below shows two improvements:
__builtin_constant_p()
instead of std::is_constant_evaluated()
. Prior versions of GCC complain about asm
appearing in a constexpr
function.)less_asm
.)template <class U>
constexpr bool less(U n, U m) noexcept {
if (__builtin_constant_p(n < m))
return n < m;
return [&]{
bool r;
asm("cmp\t{%[m], %[n]|%[n], %[m]}"
: "=@ccb"(r) : [n]"r"(n), [m]"re"(m) : "cc");
return r;
}();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With