For code such as this:
#include <stdint.h>
char* ptrAdd(char* ptr, uint32_t x)
{
return ptr + (uint32_t)__builtin_ctz(x);
}
GCC generates a sign-extension: (godbolt link)
xor eax, eax
rep bsf eax, esi
cdqe ; sign-extend eax into rax
add rax, rdi
ret
This is, of course, completely redundant - this is blatantly sign-extending an unsigned integer. Can I convince GCC not to do this?
The problem exists since GCC 4.9.0, but before that it used to be an explicit zero-extension which is also redundant.
A partial solution is to use the 64-bit version of ctz
, along with a -march
argument so that tzcnt
is used instead of bsf
, like so:
char* ptrAdd(char* ptr, uint32_t x)
{
return ptr + __builtin_ctzl(x);
}
This results in no sign extension:
ptrAdd(char*, unsigned int):
mov eax, esi
tzcnt rax, rax
add rax, rdi
ret
It has a mov
(to do the 32 to 64-bit zero extension) which replaced a zeroing xor
in the 32-bit version (which was there to work around the tzcnt
false-dependency-on-destination issue). Those are about the same cost, but the mov
is more likely to disappear after inlining. The result of a 64-bit tzcnt
is the same as a 32-bit one, except for the case of zero input which is undefined (as far as the gcc
intrinsic goes, not tzcnt
).
Unfortunately, without a -march
argument that lets the compiler use tzcnt
it will use bsf
and in that case still does the sign extension.
It seems that the origin of the differing behavior between bsf
and tzcnt
is that in the case that bsf
version is used, the instruction behavior is undefined at zero. So in principle, the instruction could return anything, even values outside the range 0 to 63 that we would normally expect. Combined with the fact that the return value is declared as int
, simply omitting the sign extension could lead to "impossible" situations like (__builtin_clzl (x) & 0xff) == 0xdeadbeef
.
Now per the gcc docs, zero input to __builtin_ctzl
has an "undefined result" - but it isn't clear if this is the same as C/C++ "undefined behavior" where anything can happen (which would allow impossible things), or just means "some unspecified value".
You can read about this on the gcc bugzilla, where an issue has been open for about 7 years.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With