Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrong GCC 9 (and higher) optimization of memcmp with -fno-inline

There is a small func function which compares a memory block against a static const zeroed array. Here is a primitive example to illustrate the problem:

#include <cstring>
#include <memory>

#define MAX_BYTES (256)

inline int my_memcmp(const void * mem1, const void * mem2, const size_t size)
{
    const auto *first  = reinterpret_cast<const uint8_t *>(mem1);
    const auto *second = reinterpret_cast<const uint8_t *>(mem2);
    if (size < 8)
    {
        for (int i = 0; i < size; ++i) {
            if (*first != *second) return (*first > *second) ? 1 : -1;
            ++first; ++second;
        }
        return 0;
    }

    return std::memcmp(mem1, mem2, size);
}

bool func(const uint8_t* in, size_t size)
{
  size_t remain = size;
  static const uint8_t zero_arr[MAX_BYTES] = { 0 };

  while (remain >= MAX_BYTES)
  {
    if (my_memcmp(in, zero_arr, MAX_BYTES) != 0)
    {
      return false;
    }
    remain -= MAX_BYTES;
    in += MAX_BYTES;
  }

  return true;
}
  • Compiler: gcc 9.1 and higher
  • Compiler flags: -fno-inline -O3
  • Godbolt disassemble link: https://godbolt.org/z/P8vKGq
  • Godbolt program execution link: https://godbolt.org/z/qr8f16

In case I use -fno-inline compiler flags, the compiler tries to optimize the code above and generates only 2 lines of code for my_memcmp function, however it seems like it always returns 0:

my_memcmp(void const*, void const*, unsigned long) [clone .constprop.0]:
        movzx   eax, BYTE PTR [rdi]
        ret

The problem cannot be reproduced until I add -fno-inline (I met the problem when I compiled the code for coverage testing, so I needed to add no-inline to make a report more clear.) Also I've found that gcc 8 doesn't have such problem. Is there a reasonable explanation or is it just a bug in both GCC 9 and 10?

like image 207
Rom098 Avatar asked Sep 03 '20 13:09

Rom098


1 Answers

This is GCC bug 95189, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189

Basically, GCC can emit specialized code for memcmp if one of the buffers has known contents, but this specialization doesn't work correctly if it encounters a zero byte (because it's special for other functions such as strcmp).

It appears already fixed on GCC main development branch (trunk), but the fix was not backported to 9.x and 10.x release branches yet.

This minimal repro in C is miscompiled at -O2, a similar example is mentioned in the comments of the bug:

int f(const char *p)
{
    return __builtin_memcmp(p, "\0\0\0", 4);
}
like image 53
amonakov Avatar answered Oct 14 '22 00:10

amonakov