A developer can use the __builtin_expect
builtin to help the compiler understand in which direction a branch is likely to go.
In the future, we may get a standard attribute for this purpose, but as of today at least all of clang
, icc
and gcc
support the non-standard __builtin_expect
instead.
However, icc
seems to generate oddly terrible code when you use it1. That is, code that is uses the builtin is strictly worse than the code without it, regardless of which direction the prediction is made.
Take for example the following toy function:
int foo(int a, int b)
{
do {
a *= 77;
} while (b-- > 0);
return a * 77;
}
Out of the three compilers, icc
is the only one that compiles this to the optimal scalar loop of 3 instructions:
foo(int, int):
..B1.2: # Preds ..B1.2 ..B1.1
imul edi, edi, 77 #4.6
dec esi #5.12
jns ..B1.2 # Prob 82% #5.18
imul eax, edi, 77 #6.14
ret
Both gcc and Clang manage the miss the easy solution and use 5 instructions.
On the other hand, when you use likely
or unlikely
macros on the loop condition, icc
goes totally braindead:
#define likely(x) __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)
int foo(int a, int b)
{
do {
a *= 77;
} while (likely(b-- > 0));
return a * 77;
}
This loop is functionally equivalent to the previous loop (since __builtin_expect
just returns its first argument), yet icc produces some awful code:
foo(int, int):
mov eax, 1 #9.12
..B1.2: # Preds ..B1.2 ..B1.1
xor edx, edx #9.12
test esi, esi #9.12
cmovg edx, eax #9.12
dec esi #9.12
imul edi, edi, 77 #8.6
test edx, edx #9.12
jne ..B1.2 # Prob 95% #9.12
imul eax, edi, 77 #11.15
ret #11.15
The function has doubled in size to 10 instructions, and (worse yet!) the critical loop has more than doubled to 7 instructions with a long critical dependency chain involving a cmov
and other weird stuff.
The same is true if you use the unlikely
hint and also across all icc versions (13, 14, 17) that godbolt supports. So the code generation is strictly worse, regardless of the hint, and regardless of the actual runtime behavior.
Neither gcc
nor clang
suffer any degradation when hints are used.
What's up with that?
1 At least in the first and subsequent examples I tried.
To me it seems an ICC bug. This code (available on godbolt)
int c;
do
{
a *= 77;
c = b--;
}
while (likely(c > 0));
that simply use an auxiliary local var c
, produces an output without the edx = !!(esi > 0)
pattern
foo(int, int):
..B1.2:
mov eax, esi
dec esi
imul edi, edi, 77
test eax, eax
jg ..B1.2
still not optimal (it could do without eax
), though.
I don't know if the official ICC policy about __builtin_expect
is full support or just compatibility support.
This question seems better suited for the Official ICC forum.
I've tried posting this topic there but I'm not sure I've made a good job (I've been spoiled by SO).
If they answer me I'll update this answer.
EDIT
I've got and an answer at the Intel Forum, they recorded this issue in their tracking system.
As today, it seems a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With