After reading many of the comments on this question, there are a couple people (here and here) that suggest that this code:
int val = 5;
int r = (0 < val) - (val < 0); // this line here
will cause branching. Unfortunately, none of them give any justification or say why it would cause branching (tristopia suggests it requires a cmove
-like instruction or predication, but doesn't really say why).
Are these people right in that "a comparison used in an expression will not generate a branch" is actually myth instead of fact? (assuming you're not using some esoteric processor) If so, can you give an example?
I would've thought there wouldn't be any branching (given that there's no logical "short circuiting"), and now I'm curious.
To simplify matters, consider just one part of the expression: val < 0
. Essentially, this means “if val
is negative, return 1
, otherwise 0
”; you could also write it like this:
val < 0 ? 1 : 0
How this is translated into processor instructions depends heavily on the compiler and the target processor. The easiest way to find out is to write a simple test function, like so:
int compute(int val) {
return val < 0 ? 1 : 0;
}
and review the assembler code that is generated by the compiler (e.g., with gcc -S -o - example.c
). For my machine, it does it without branching. However, if I change it to return 5
instead of 1
, there are branch instructions:
...
cmpl $0, -4(%rbp)
jns .L2
movl $5, %eax
jmp .L3
.L2:
movl $0, %eax
.L3:
...
So, “a comparison used in an expression will not generate a branch” is indeed a myth. (But “a comparison used in an expression will always generate a branch” isn’t true either.)
Addition in response to this extension/clarification:
I'm asking if there's any (sane) platform/compiler for which a branch is likely. MIPS/ARM/x86(_64)/etc. All I'm looking for is one case that demonstrates that this is a realistic possibility.
That depends on what you consider a “sane” platform. If the venerable 6502 CPU family is sane, I think there is no way to calculate val > 0
on it without branching. Most modern instruction sets, on the other hand, provide some type of set-on-X instruction.
(val < 0
can actually be computed without branching even on 6502, because it can be implemented as a bit shift.)
Empiricism for the win:
int sign(int val) {
return (0 < val) - (val < 0);
}
compiled with optimisations. gcc (4.7.2) produces
sign:
.LFB0:
.cfi_startproc
xorl %eax, %eax
testl %edi, %edi
setg %al
shrl $31, %edi
subl %edi, %eax
ret
.cfi_endproc
no branch. clang (3.2):
sign: # @sign
.cfi_startproc
# BB#0:
movl %edi, %ecx
shrl $31, %ecx
testl %edi, %edi
setg %al
movzbl %al, %eax
subl %ecx, %eax
ret
neither. (on x86_64, Core i5)
This is actually architecture-dependent. If there exists an instruction to set the value to 0/1 depending on the sign of another value, there will be no branching. If there's no such instruction, branching would be necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With