Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why don't modern compilers coalesce neighboring memory accesses?

Consider the following code:

bool AllZeroes(const char buf[4]) {     return buf[0] == 0 &&            buf[1] == 0 &&            buf[2] == 0 &&            buf[3] == 0; } 

Output assembly from Clang 13 with -O3:

AllZeroes(char const*):                        # @AllZeroes(char const*)         cmp     byte ptr [rdi], 0         je      .LBB0_2         xor     eax, eax         ret .LBB0_2:         cmp     byte ptr [rdi + 1], 0         je      .LBB0_4         xor     eax, eax         ret .LBB0_4:         cmp     byte ptr [rdi + 2], 0         je      .LBB0_6         xor     eax, eax         ret .LBB0_6:         cmp     byte ptr [rdi + 3], 0         sete    al         ret 

Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:

bool AllZeroes(const char buf[4]) {     return *(int*)buf == 0; } 

Resulting in:

AllZeroes2(char const*):                      # @AllZeroes2(char const*)         cmp     dword ptr [rdi], 0         sete    al         ret 

I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C++ specification?

Edit: Changing the short-circuited AND (&&) to bitwise AND (&) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP

like image 499
Daniel Avatar asked Nov 25 '21 11:11

Daniel


2 Answers

If buf[0] is nonzero, the code will not access buf[1]. So the function should return false without checking the other buf elements. If buf is close to the end of the last memory page, buf[1] may trigger an access fault. The compiler should be very careful to not read stuff which may be forbidden to read.

like image 145
anatolyg Avatar answered Oct 11 '22 11:10

anatolyg


The first thing to understand is that f(const char buf[4]) does not guarantee that the pointer points to 4 elements, it means exactly the same as const char *buf, the 4 is completely ignored by the language. (C99 has a solution to this, but it's not supported in C++, more on that below)

Given AllZeroes(memset(malloc(1),~0,1)), the implementation

bool AllZeroes(const char buf[4]) {     return buf[0] == 0 &&            buf[1] == 0 &&            buf[2] == 0 &&            buf[3] == 0; } 

should work, because it never tries to read byte #2 (which doesn't exist) when it notices that byte #1 is non-zero, while the implementation

bool AllZeroes(const int32_t *buf) {     return (*buf == 0); } 

should segfault as it tries to read the first 4 bytes while only 1 byte exists (malloced 1 byte only)

FWIW Clang gets it right (and GCC doesn't) in C99 with the implementation

_Bool AllZeroes(const char buf[static 4]) {     return buf[0] == 0 &            buf[1] == 0 &            buf[2] == 0 &            buf[3] == 0; } 

which compiles to the same as

_Bool AllZeroes(const int32_t *buf) {     return (*buf == 0); }  

see https://godbolt.org/z/Grqs3En3K (thanks to Caze @libera #C for finding that)

  • unfortunately buf[static 4], which in C99 is a guarantee-from-the-programmer-to-the-compiler that the pointer points to minimum 4 elements, is not supported in C++
like image 32
hanshenrik Avatar answered Oct 11 '22 10:10

hanshenrik