Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different gcc output for __builtin_clzll on different optimisation levels and wrapped in a function

Tags:

c

linux

gcc

I am puzzled on the behavior / output of the following code, either this is a bug or I am missing something. (Ubuntu 16.04 on skylake arch)

#include <iostream>

int wrap(unsigned long long val) {
    return __builtin_clzll(val);
} 

using namespace std;
int main() {
    cout << __builtin_clzll(0) << " " << wrap(0) << endl;
    cout << __builtin_clzll(1) << " " << wrap(1) << endl;
    cout << __builtin_clzll(2) << " " << wrap(2) << endl;
}

and here are different outputs on different compile settings. I do know that clz may return an undefined result if zero is passed. However the directly inlined call works always fine, but as soon as stack is involved the compiler messes up.

snk@maggy:~/HCS$ g++ -O0 test.cpp -o test
snk@maggy:~/HCS$ ./test
64 4196502
63 63
62 62
snk@maggy:~/HCS$ 

The -O > 0 levels do not change result, I guess gcc is inlining. This is the expected result...

snk@maggy:~/HCS$ g++ -O1 test.cpp -o test
snk@maggy:~/HCS$ ./test
64 64
63 63
62 62

It gets even better with -mlzcnt:

snk@maggy:~/HCS$ g++ -O0 -mlzcnt test.cpp -o test
snk@maggy:~/HCS$ ./test
64 0
63 0
62 1

snk@maggy:~/HCS$ g++ -O1 -mlzcnt test.cpp -o test
snk@maggy:~/HCS$ ./test
64 64
63 63
62 62

snk@maggy:~/HCS$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks, Ch

like image 757
chhu79 Avatar asked Jul 14 '17 16:07

chhu79


2 Answers

The interesting case in this question is the behaviour with -mlzcnt. This was reported as GCC bug 58928 in 2013 but the bug report was later retracted, because it is "expected" behaviour when you supply -mlzcnt for Intel CPUs which do not support the LZCNT opcode.

As it turns out, LZCNT is a BSR (Bit Search Reverse) with a F3 prefix; on Intel CPUs which don't implement LZCNT, rather than being trapped as an invalid opcode, it is interpreted as a BSR, which returns the bit position of the 1-bit (with bit 0 being the low-order bit), rather than the number of preceding 0s.

As indicated, invoking __builtin_clz with argument 0 produces undefined behaviour. You should have no expectations about the result of undefined behaviour; not even that it will be the same result twice.

like image 153
rici Avatar answered Nov 20 '22 22:11

rici


Per the GCC documentation for built-in functions (bold text added)

Built-in Function: int __builtin_clz (unsigned int x)

Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

...

Built-in Function: int __builtin_clzll (unsigned long long)

Similar to __builtin_clz, except the argument type is unsigned long long.

The result for 0 is undefined.

like image 2
Andrew Henle Avatar answered Nov 20 '22 23:11

Andrew Henle