I am puzzled on the behavior / output of the following code, either this is a bug or I am missing something. (Ubuntu 16.04 on skylake arch)
#include <iostream>
int wrap(unsigned long long val) {
return __builtin_clzll(val);
}
using namespace std;
int main() {
cout << __builtin_clzll(0) << " " << wrap(0) << endl;
cout << __builtin_clzll(1) << " " << wrap(1) << endl;
cout << __builtin_clzll(2) << " " << wrap(2) << endl;
}
and here are different outputs on different compile settings. I do know that clz may return an undefined result if zero is passed. However the directly inlined call works always fine, but as soon as stack is involved the compiler messes up.
snk@maggy:~/HCS$ g++ -O0 test.cpp -o test
snk@maggy:~/HCS$ ./test
64 4196502
63 63
62 62
snk@maggy:~/HCS$
The -O > 0 levels do not change result, I guess gcc is inlining. This is the expected result...
snk@maggy:~/HCS$ g++ -O1 test.cpp -o test
snk@maggy:~/HCS$ ./test
64 64
63 63
62 62
It gets even better with -mlzcnt:
snk@maggy:~/HCS$ g++ -O0 -mlzcnt test.cpp -o test
snk@maggy:~/HCS$ ./test
64 0
63 0
62 1
snk@maggy:~/HCS$ g++ -O1 -mlzcnt test.cpp -o test
snk@maggy:~/HCS$ ./test
64 64
63 63
62 62
snk@maggy:~/HCS$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Thanks, Ch
The interesting case in this question is the behaviour with -mlzcnt
. This was reported as GCC bug 58928 in 2013 but the bug report was later retracted, because it is "expected" behaviour when you supply -mlzcnt
for Intel CPUs which do not support the LZCNT
opcode.
As it turns out, LZCNT
is a BSR
(Bit Search Reverse) with a F3
prefix; on Intel CPUs which don't implement LZCNT, rather than being trapped as an invalid opcode, it is interpreted as a BSR, which returns the bit position of the 1-bit (with bit 0 being the low-order bit), rather than the number of preceding 0s.
As indicated, invoking __builtin_clz
with argument 0 produces undefined behaviour. You should have no expectations about the result of undefined behaviour; not even that it will be the same result twice.
Per the GCC documentation for built-in functions (bold text added)
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
...
Built-in Function: int __builtin_clzll (unsigned long long)
Similar to
__builtin_clz
, except the argument type isunsigned long long
.
The result for 0
is undefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With