Say I have code like this, call it version 1:
while (some_condition) {
// .. work that may trigger rare_condition ...
if (rare_condition) {
// .. rare work here ..
continue;
}
// .. work that may trigger rare_condition ...
if (rare_condition) {
// .. rare work here ..
continue;
}
// .. more work
}
Assume that the "rare work" is identical across both cases. We could equivalently write version 2:
while (some_condition) {
// .. work that may trigger rare_condition ...
if (rare_condition) {
goto rare_work;
}
// .. work that may trigger rare_condition ...
if (rare_condition) {
goto rare_work;
}
// .. more work
continue;
rare_work:
// .. rare work here ..
}
The compiler should usually be smart enough to make the if()
checks result in a jump directly to rare_work
, rather than a jump to a block containing a jump to rare_work
. The compiler may even transform version 1 to version 2 for us, because it reduces the overall amount of assembly, increasing the chance of fitting in instruction cache.
My question is: if instruction cache is not an issue, is there any reason to expect one or the other to perform better? This is assuming we care about low level micro-optimization and are willing to write the code in assembly, if we have to in order to force getting one version or the other. I realize things like port availability could change based on what the work instructions actually are, I'm just wondering if there is any high level reason to expect a difference before doing such an analysis.
My question is: if instruction cache is not an issue, is there any reason to expect one or the other to perform better?
The performance hiccup here would be the number of branches. If there is a branch prediction miss, that's when you lose performance. And the best counter measure is simply to try to reduce the number of branches. Since your code is pseudo code, there's no telling if doing so is possible. But that's what manual optimization should focus on, rather than code re-ordering.
From a pure readability point of view, it would be perfectly possible to rewrite the loop without any "spaghetti programming" aka continue/goto and still keep the same performance:
while(some_condition)
{
bool is_rare;
is_rare = work1();
if(is_rare)
{
rare();
}
else
{
is_rare = work2();
if(is_rare)
{
rare();
}
}
}
Where all functions are static inline
. While trying this out in gcc x86 -O3 I got assembler code pretty much identical to your goto
version, with rare()
inlined at the bottom of the loop. Which at least proves that we don't have to write spaghetti on the C level.
Artificial example code: https://godbolt.org/z/M8EznYr1z. .L5:
is the rare()
function inlined.
do_stuff:
sub rsp, 24
.L12:
movzx eax, BYTE PTR [rsp+15]
test al, al
je .L1
.L14:
mov edi, OFFSET FLAT:.LC0
call puts
cmp BYTE PTR rare_[rip], 0
jne .L5
mov edi, OFFSET FLAT:.LC2
call puts
cmp BYTE PTR rare_[rip], 0
je .L12
.L5:
mov edi, OFFSET FLAT:.LC1
call puts
movzx eax, BYTE PTR [rsp+15]
test al, al
jne .L14
.L1:
add rsp, 24
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With