Consider the following code:
bool AllZeroes(const char buf[4]) { return buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0; }
Output assembly from Clang 13 with -O3
:
AllZeroes(char const*): # @AllZeroes(char const*) cmp byte ptr [rdi], 0 je .LBB0_2 xor eax, eax ret .LBB0_2: cmp byte ptr [rdi + 1], 0 je .LBB0_4 xor eax, eax ret .LBB0_4: cmp byte ptr [rdi + 2], 0 je .LBB0_6 xor eax, eax ret .LBB0_6: cmp byte ptr [rdi + 3], 0 sete al ret
Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:
bool AllZeroes(const char buf[4]) { return *(int*)buf == 0; }
Resulting in:
AllZeroes2(char const*): # @AllZeroes2(char const*) cmp dword ptr [rdi], 0 sete al ret
I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C++ specification?
Edit: Changing the short-circuited AND (&&
) to bitwise AND (&
) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP
If buf[0]
is nonzero, the code will not access buf[1]
. So the function should return false
without checking the other buf
elements. If buf
is close to the end of the last memory page, buf[1]
may trigger an access fault. The compiler should be very careful to not read stuff which may be forbidden to read.
The first thing to understand is that f(const char buf[4])
does not guarantee that the pointer points to 4 elements, it means exactly the same as const char *buf
, the 4 is completely ignored by the language. (C99 has a solution to this, but it's not supported in C++, more on that below)
Given AllZeroes(memset(malloc(1),~0,1))
, the implementation
bool AllZeroes(const char buf[4]) { return buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0; }
should work, because it never tries to read byte #2 (which doesn't exist) when it notices that byte #1 is non-zero, while the implementation
bool AllZeroes(const int32_t *buf) { return (*buf == 0); }
should segfault as it tries to read the first 4 bytes while only 1 byte exists (malloced 1 byte only)
FWIW Clang gets it right (and GCC doesn't) in C99 with the implementation
_Bool AllZeroes(const char buf[static 4]) { return buf[0] == 0 & buf[1] == 0 & buf[2] == 0 & buf[3] == 0; }
which compiles to the same as
_Bool AllZeroes(const int32_t *buf) { return (*buf == 0); }
see https://godbolt.org/z/Grqs3En3K (thanks to Caze @libera #C for finding that)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With