When I compile the following simple recursion code with g++, the assembly code simply returns i, as if g++ can do some algebra tricks as humans can.
int Identity(int i) {
if (i == 1)
return 1;
else
return Identity(i-1)+1;
}
I don't think this optimization is about tail recursion, and apparently, g++ must at least do these two things:
How to reproduce
% g++ -v
gcc version 8.2.1 20181127 (GCC)
% g++ a.cpp -c -O2 && objdump -d a.o
Disassembly of section .text:
0000000000000000 <_Z8Identityi>:
0: 89 f8 mov %edi,%eax
2: c3
Updated: Thanks to many people for answering the problem. I collected some discussions and updates here.
Updated + answered: Thanks to the answer below (I have marked it helpful, and also check the answer from manlio), I guess I understand how a compiler can do this in an easy way. Please see the example below. First, modern gcc can do something more powerful than tail recursion, so the code is converted into something like this:
// Equivalent to return i
int Identity_v2(int i) {
int ans = 0;
for (int i = x; i != 0; i--, ans++) {}
return ans;
}
// Equivalent to return i >= 0 ? i : 0
int Identity_v3(int x) {
int ans = 0;
for (int i = x; i >= 0; i--, ans++) {}
return ans;
}
(I guess that) the compiler can know that ans and i share the same delta and it also knows i = 0 when leaving the loop. Therefore, the compiler knows it should return i.
In v3, I use the >=
operator so the compiler also checks the sign of the input for me.
This could be much simpler than I've guessed.
gcc is able to make optimizations on recursion even in the case of non tail-recursive calls.
There are many problems where you are trying to do optimization. That is, you are attempting to find a way of doing something that maximizes or minimizes some quantity. A recursive divide and conquer approach is often successful for dealing with such problems.
A large variety of optimizations are provided by GCC. Most are categorized into one of three levels, but some are provided at multiple levels. Some optimizations reduce the size of the resulting machine code, while others try to create code that is faster, potentially increasing its size.
GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.
Using the -fdump-*
options, you can ask GCC to output intermediary states of the tree and discover many details about the performed optimizations.
In this case the interesting files are (numbers may vary depending on the GCC version):
This is the starting point:
int Identity(int) (int i)
{
int D.2330;
int D.2331;
int D.2332;
if (i == 1) goto <D.2328>; else goto <D.2329>;
<D.2328>:
D.2330 = 1;
return D.2330;
<D.2329>:
D.2331 = i + -1;
D.2332 = Identity (D.2331);
D.2330 = D.2332 + 1;
return D.2330;
}
The last optimized source which presents recursion:
int Identity(int) (int i)
{
int _1;
int _6;
int _8;
int _10;
<bb 2>:
if (i_3(D) == 1)
goto <bb 4>;
else
goto <bb 3>;
<bb 3>:
_6 = i_3(D) + -1;
_8 = Identity (_6);
_10 = _8 + 1;
<bb 4>:
# _1 = PHI <1(2), _10(3)>
return _1;
}
As is normal with SSA, GCC inserts fake functions known as PHI
at the start of basic blocks where needed in order to merge the multiple possible values of a variable.
Here:
# _1 = PHI <1(2), _10(3)>
where _1
either gets the value of 1
, or of _10
, depending on whether we reach here via block 2
or block 3
.
This is the first dump in which the recursion has been turned into a loop:
int Identity(int) (int i)
{
int _1;
int add_acc_4;
int _6;
int acc_tmp_8;
int add_acc_10;
<bb 2>:
# i_3 = PHI <i_9(D)(0), _6(3)>
# add_acc_4 = PHI <0(0), add_acc_10(3)>
if (i_3 == 1)
goto <bb 4>;
else
goto <bb 3>;
<bb 3>:
_6 = i_3 + -1;
add_acc_10 = add_acc_4 + 1;
goto <bb 2>;
<bb 4>:
# _1 = PHI <1(2)>
acc_tmp_8 = add_acc_4 + _1;
return acc_tmp_8;
}
The same optimisation that handles tail calls also handles trivial cases of making the call tail recursive by creating accumulators.
There is a very similar example in the starting comment of the https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-tailcall.c file:
The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level where they are used for sibcall optimization.
In addition to the standard tail recursion elimination, we handle the most trivial cases of making the call tail recursive by creating accumulators.
For example the following function
int sum (int n)
{
if (n > 0)
return n + sum (n - 1);
else
return 0;
}
is transformed into
int sum (int n)
{
int acc = 0;
while (n > 0)
acc += n--;
return acc;
}
To do this, we maintain two accumulators (
a_acc
andm_acc
) that indicate when we reach the return x statement, we should returna_acc + x * m_acc
instead. They are initially initialized to0
and1
, respectively, so the semantics of the function is obviously preserved. If we are guaranteed that the value of the accumulator never change, we omit the accumulator.There are three cases how the function may exit. The first one is handled in adjust_return_value, the other two in adjust_accumulator_values (the second case is actually a special case of the third one and we present it separately just for clarity):
- Just return
x
, wherex
is not in any of the remaining special shapes. We rewrite this to a gimple equivalent of returnm_acc * x + a_acc
.- return
f (...)
, wheref
is the current function, is rewritten in a classical tail-recursion elimination way, into assignment of arguments and jump to the start of the function. Values of the accumulators are unchanged.- return
a + m * f(...)
, wherea
andm
do not depend on call tof
. To preserve the semantics described before we want this to be rewritten in such a way that we finally returna_acc + (a + m * f(...)) * m_acc = (a_acc + a * m_acc) + (m * m_acc) * f(...)
. I.e. we increasea_acc
bya * m_acc
, multiplym_acc
bym
and eliminate the tail call tof
. Special cases when the value is just added or just multiplied are obtained by settinga = 0
orm = 1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With