I have written a simple Fibonacci function as an exercise in C++ (using Visual Studio) to test Tail Recursion and to see how it works.
this is the code:
int fib_tail(int n, int res, int next) {
  if (n == 0) {
    return res;
  }
  return fib_tail(n - 1, next, res + next);
}
int main()
{
  fib_tail(10,0,1); //Tail Recursion works
}
when I compiled using Release mode I saw the optimized assembly using the JMP instruction in spite of a call. So my conclusion was: tail recursion works. See image below:

I wanted to do some performance tests by increasing the input variable n in my Fibonacci function. I then opted to change the variable type, used in the function, from int to unsigned long long. Then I passed a big number like: 10e+08
This is now the new function:
typedef  unsigned long long ULONG64;
ULONG64 fib_tail(ULONG64 n, ULONG64 res, ULONG64 next) {
   if (n == 0) {
     return res;
   }
   return fib_tail(n - 1, next, res + next);
}
int main()
{
  fib_tail(10e+9,0,1); //Tail recursion does not work
}
When I ran the code above I got a stack overflow exception, which made me think that tail recursion was not working. I looked at the assembly and in fact I found this:

As you see now there is a call instruction whereas I was expecting only a simple JMP. I don't understand the reason why using a 8 bytes variable disables tail recursion. Why the compiler doesn't perform an optimization in such case?
This is one of those questions that you'd have to ask the guys that do compiler optimisation for MS - there is really no technical reason why ANY return type should prevent tail-recursion from being a jump as such - there may be OTHER reasons such as "the code is too complex to understand" or some such.
clang 3.7 as of a couple of weeks back clearly figures it out:
_Z8fib_tailyyy:                         # @_Z8fib_tailyyy
    pushl   %ebp
    pushl   %ebx
    pushl   %edi
    pushl   %esi
    pushl   %eax
    movl    36(%esp), %ecx
    movl    32(%esp), %esi
    movl    28(%esp), %edi
    movl    24(%esp), %ebx
    movl    %ebx, %eax
    orl %edi, %eax
    je  .LBB0_1
    movl    44(%esp), %ebp
    movl    40(%esp), %eax
    movl    %eax, (%esp)            # 4-byte Spill
.LBB0_3:                                # %if.end
    movl    %ebp, %edx
    movl    (%esp), %eax            # 4-byte Reload
    addl    $-1, %ebx
    adcl    $-1, %edi
    addl    %eax, %esi
    adcl    %edx, %ecx
    movl    %ebx, %ebp
    orl %edi, %ebp
    movl    %esi, (%esp)            # 4-byte Spill
    movl    %ecx, %ebp
    movl    %eax, %esi
    movl    %edx, %ecx
    jne .LBB0_3
    jmp .LBB0_4
.LBB0_1:
    movl    %esi, %eax
    movl    %ecx, %edx
.LBB0_4:                                # %return
    addl    $4, %esp
    popl    %esi
    popl    %edi
    popl    %ebx
    popl    %ebp
    retl
main:                                   # @main
    subl    $28, %esp
    movl    $0, 20(%esp)
    movl    $1, 16(%esp)
    movl    $0, 12(%esp)
    movl    $0, 8(%esp)
    movl    $2, 4(%esp)
    movl    $1410065408, (%esp)     # imm = 0x540BE400
    calll   _Z8fib_tailyyy
    movl    %edx, f+4
    movl    %eax, f
    xorl    %eax, %eax
    addl    $28, %esp
    retl
Same applies to gcc 4.9.2 if you give it -O2 (but not in -O1 which was all clang needed)
(And of course also in 64-bit mode)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With