Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are these 8 byte-writes not optimized into a MOV?

My colleague and myself are unsuccessful in explaining why GCC, ICC and Clang do not optimize this function

void f(std::uint64_t a, void * p) {
    std::uint8_t *x = reinterpret_cast<std::uint8_t *>(p);
    x[7] = a >> 56;
    x[6] = a >> 48;
    x[5] = a >> 40;
    x[4] = a >> 32;
    x[3] = a >> 24;
    x[2] = a >> 16;
    x[1] = a >> 8;
    x[0] = a;
}

Into this

mov     QWORD PTR [rsi], rdi

If we formulate f in terms of memcpy, it emits just that mov. Why does it not happen if we do the seemingly trivial sequence of byte writes?

like image 452
Johannes Schaub - litb Avatar asked Oct 23 '17 20:10

Johannes Schaub - litb


1 Answers

I'm not an expert, but gcc only gained the ability to merge adjacent stores for immediate constants in gcc 7:

  • Closed bug for immediate constants: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684
  • Open bug for assignment of small structs:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821
  • Store-merging pass code: https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-ssa-store-merging.c

If I had to guess, by the second bug, it might not be too long a wait.

like image 141
Jeff Garrett Avatar answered Sep 20 '22 13:09

Jeff Garrett