Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC seems to prefer small immediate values in comparisons. Is there a way to avoid that?

Firstly a trivial mathematical fact: given integers n and m, we have n < m if, and only if, n <= m - 1.

GCC seems to prefer immediate values of smaller absolute value. Hence, when m is known and other conditions are met, the compiler chooses among equivalent comparison expressions the one minimizing absolute values. For instance, it prefers n <= 1000 over n < 1001 and GCC 9.2 translates this

bool f(uint32_t n) {
  return n < 1001;
}

into this x86 assembly code

f(unsigned int):
  cmpl $1000, %edi
  setbe %al
  ret

There might be good performance reasons for that but that's not my question. What I'd like to know is this: Is there a way to force GCC to keep the original comparison? More specifically, I'm not worried about portability and thus, GCC specifics (options, pragmas, attributes, ...) are OK for me. However, I'm looking for a constexpr friendly solution which seems to exclude inline asm. Finally, I'm targeting C++17 which excludes things like std::is_constant_evaluated. (Having said that, please, fell free to provide answers regardless of my constraints because it might still be useful for others.)

You might ask why I want to do such thing. Here we go. To my understanding (please, correct me if I'm wrong) this behavior might be a "pessimization" for x86_64 in the following example:

bool g(uint64_t n) {
  n *= 5000000001;
  return n < 5000000001;
}

which is translated by GCC 6.2 into

g(unsigned long):
  movabsq $5000000001, %rax
  imulq %rax, %rdi
  movabsq $5000000000, %rax
  cmpq %rax, %rdi
  setbe %al
  ret

In x86_64, computations with 64-bits immediate values have some restrictions that might imply these values to be loaded into registers. In the above example, this happens twice: constants 5000000001 and 5000000000 are stored in rax for the multiplication and the comparison. Had GCC kept the original comparison as it appears in the C++ code (i.e., against 5000000001) there would be no need for the second movabs.

This also implies a code size penalty which, I guess, was considered an issue and more recent versions of GCC (e.g. 9.2) produce this:

g(unsigned long):
  movabsq $5000000001, %rax
  imulq %rax, %rdi
  subq $1, %rax
  cmpq %rax, %rdi
  setbe %al
  ret

Hence the 10-bytes long movabs was replaced by a 4-bytes long subq instruction. In any case, subq also seems unnecessary.

like image 580
Cassio Neri Avatar asked Nov 07 '22 11:11

Cassio Neri


2 Answers

In general the only way to force GCC to output a particular instruction sequence that you might want is to use assembly. If GCC isn't generating optimal code, the best thing to do is to submit a bug report so hopefully it'll be improved in future versions.

For your specific case there's at least one work around that can generate the code you want, at least for current compilers. While it doesn't require writing the entire code sequence in assembly, it does use inline assembly to load the constant into a register on the assumption that GCC will prefer using this register rather than generating a new constant value for the comparison.

#include <stdint.h>

static uint64_t 
load_const_asm(uint64_t c) {
    if (__builtin_constant_p(c) && (int32_t) c != c) {
           asm("" : "+r" (c));
    }
    return c;
}

bool
g(uint64_t n) {
    uint64_t c = load_const_asm(5000000001);
    n *= c;
    return n < c;
}

which generates the following code when compiled with -O on GCC 9.2:

_Z1gm:
  movabs rax, 5000000001
  imul rdi, rax
  cmp rdi, rax
  setb al
  ret

This asm statement tricks the compiler in not generating a separate load of the constant minus one because the compiler doesn't know what asm statement does, even though its just an empty string. It forces the constant into a register, but the compiler has no idea what the register will contain after the asm statement "executes".

The advantage of doing it this way, over using an asm statement with a CMP instruction, is that it gives the compiler the most freedom optimize your code. This is a big disadvantage of inline assembly generally, it's easy for it to end up pessimizing your code. For example, using inline assembly prevents the compiler from calculating the result of g() at compile time if n is a constant. However with my code example above, it's still able to figure out that g() must return true if n is 1.

Finally, make sure that you're not trying to prematurely optimize your code. You don't want to be obfuscating your code like this if it's not going to make any noticeable different to performance your application. If you do end up using this code, or some other hack, then as Peter Cordes said in a comment, you should clearly comment your code to explain why the hack is necessary so it can be removed when it's no longer needed.

like image 105
Ross Ridge Avatar answered Nov 15 '22 07:11

Ross Ridge


Here is a C++20 solution which, sadly, I cannot use

#include <cstdint>
#include <type_traits>

template <class U>
bool less_asm(U n, U m) noexcept {
  bool r;
  asm("cmp%z[m]\t{%[m], %[n]|%[n], %[m]}"
    : "=@ccb"(r) : [n]"r"(n), [m]"re"(m) : "cc");
  return r;
}

template <class U>
constexpr bool less(U n, U m) noexcept {
  if (std::is_constant_evaluated())
    return n < m;
  return less_asm(n, m);
}

static_assert(less(uint64_t(0), uint64_t(1)));

bool g(uint64_t n) {
  n *= 5000000001;
  return less<uint64_t>(n, 5000000001);
}

GCC 9.2 (with -O2 -std=c++2a) generates this:

g(unsigned long):
  movabsq $5000000001, %rax
  imulq %rax, %rdi
  cmpq %rax, %rdi
  setc %al
  ret

Update: The snippet below shows two improvements:

  1. It works with C++17 but it requires GCC-9.1 or later. (Thanks to Peter Cordes' suggestion to use __builtin_constant_p() instead of std::is_constant_evaluated(). Prior versions of GCC complain about asm appearing in a constexpr function.)
  2. It introduces fewer names in the scope. (By using a lambda instead of less_asm.)


template <class U>
constexpr bool less(U n, U m) noexcept {
  if (__builtin_constant_p(n < m))
    return n < m;
  return [&]{
    bool r;
    asm("cmp\t{%[m], %[n]|%[n], %[m]}"
      : "=@ccb"(r) : [n]"r"(n), [m]"re"(m) : "cc");
    return r;
  }();
}
like image 23
Cassio Neri Avatar answered Nov 15 '22 06:11

Cassio Neri