Consider the following code:
// Preprocessor
#include <iostream>
#include <vector>
// Internal branching
void f1(std::vector<int>& v, const int x = 0)
{
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = (x != 0) ? (v[i-1]*x) : (v[i-1]+v[i-1]);
}
}
// External branching
void f2(std::vector<int>& v, const int x = 0)
{
if (x != 0) {
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = v[i-1]*x;
}
} else {
for (unsigned int i = 1; i < v.size(); ++i) {
v[i] = v[i-1]+v[i-1];
}
}
}
// Main
int main()
{
std::vector<int> v(10, 2);
f1(v);
f2(v);
return 0;
}
It illustrates the behaviour of two functions that produce the same result:
f1: the condition is tested inside of the loopf2: the condition is tested outside of the loopThe branching is based on x, which is declared as const.
My question is: are compilers sufficiently intelligent to transform f1 in f2 when all optimization levels are turned on?
The best way to check if your compiler is hoisting the conditional out the loop is to really inspect the assembly after compiling it with full optimizations.
After building your example with:
g++ -O3 -c example.cpp -o example.o
objdump -d -M intel example.o > example.S
Here's what I got for f1:
00000020 <f1(std::vector<int, std::allocator<int> >&, int)>:
; ...
23: 8b 54 24 10 mov edx,DWORD PTR [esp+0x10]
27: 8b 7c 24 14 mov edi,DWORD PTR [esp+0x14]
2b: 8b 02 mov eax,DWORD PTR [edx]
2d: 8b 4a 04 mov ecx,DWORD PTR [edx+0x4]
30: 29 c1 sub ecx,eax
32: c1 f9 02 sar ecx,0x2
35: 83 f9 01 cmp ecx,0x1
38: 76 d jbe 57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>
3a: 31 db xor ebx,ebx
3c: 85 ff test edi,edi
3e: ba 01 00 00 00 mov edx,0x1
43: 75 b jne 60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
45: 8b 34 18 mov esi,DWORD PTR [eax+ebx*1]
48: 83 c3 04 add ebx,0x4
4b: 01 f6 add esi,esi
4d: 89 34 90 mov DWORD PTR [eax+edx*4],esi
50: 83 c2 01 add edx,0x1
53: 39 d1 cmp ecx,edx
55: 75 ee jne 45 <f1(std::vector<int, std::allocator<int> >&, int)+0x25>
57: 5b pop ebx
58: 5e pop esi
59: 5f pop edi
5a: c3 ret
5b: 90 nop
5c: 8d 74 26 00 lea esi,[esi+eiz*1+0x0]
60: 8b 34 18 mov esi,DWORD PTR [eax+ebx*1]
63: 83 c3 04 add ebx,0x4
66: 0f af f7 imul esi,edi
69: 89 34 90 mov DWORD PTR [eax+edx*4],esi
6c: 83 c2 01 add edx,0x1
6f: 39 ca cmp edx,ecx
71: 75 ed jne 60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
73: eb e2 jmp 57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>
On line 3c, you find the conditional being checked:
; if(x != 0)
3c: 85 ff test edi,edi
43: 75 b jne 60 ; ...
From that point on after the check, x is never tested again and the loops are just executed for each part. The first loop starting from line 45 to 55 is done when x == 0. The second loop starting from line 60 to 71 is done when x != 0.
So yes, at least in this case, gcc is able to hoist the conditional out of the loop with full optimizations enable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With