Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I get rid of a sign-extend between CTZ and addition to a pointer?

Tags:

intel

For code such as this:

#include <stdint.h>

char* ptrAdd(char* ptr, uint32_t x)
{
    return ptr + (uint32_t)__builtin_ctz(x);
}

GCC generates a sign-extension: (godbolt link)

xor eax, eax
rep bsf eax, esi
cdqe ; sign-extend eax into rax
add rax, rdi
ret

This is, of course, completely redundant - this is blatantly sign-extending an unsigned integer. Can I convince GCC not to do this?

The problem exists since GCC 4.9.0, but before that it used to be an explicit zero-extension which is also redundant.

like image 906
harold Avatar asked Feb 06 '18 02:02

harold


1 Answers

A partial solution is to use the 64-bit version of ctz, along with a -march argument so that tzcnt is used instead of bsf, like so:

char* ptrAdd(char* ptr, uint32_t x)
{
    return ptr + __builtin_ctzl(x);
}

This results in no sign extension:

ptrAdd(char*, unsigned int):
  mov eax, esi
  tzcnt rax, rax
  add rax, rdi
  ret

It has a mov (to do the 32 to 64-bit zero extension) which replaced a zeroing xor in the 32-bit version (which was there to work around the tzcnt false-dependency-on-destination issue). Those are about the same cost, but the mov is more likely to disappear after inlining. The result of a 64-bit tzcnt is the same as a 32-bit one, except for the case of zero input which is undefined (as far as the gcc intrinsic goes, not tzcnt).

Unfortunately, without a -march argument that lets the compiler use tzcnt it will use bsf and in that case still does the sign extension.

It seems that the origin of the differing behavior between bsf and tzcnt is that in the case that bsf version is used, the instruction behavior is undefined at zero. So in principle, the instruction could return anything, even values outside the range 0 to 63 that we would normally expect. Combined with the fact that the return value is declared as int, simply omitting the sign extension could lead to "impossible" situations like (__builtin_clzl (x) & 0xff) == 0xdeadbeef.

Now per the gcc docs, zero input to __builtin_ctzl has an "undefined result" - but it isn't clear if this is the same as C/C++ "undefined behavior" where anything can happen (which would allow impossible things), or just means "some unspecified value".

You can read about this on the gcc bugzilla, where an issue has been open for about 7 years.

like image 183
BeeOnRope Avatar answered Nov 02 '22 23:11

BeeOnRope