So I've read many times before that technically .NET does support tail call optimization (TCO) because it has the opcode for it, and just C# doesn't generate it.
I'm not exactly sure why TCO needs an opcode or what it would do. As far as I know, the requirement for being able to do TCO is that the results of a recursive call are not combined with any variables in the current function scope. If you don't have that, then I don't see how an opcode prevents you from having to keep a stack frame open. If you do have that, then can't the compiler always easily compile it to something iterative?
So what is the point of an opcode? Obviously there's something I'm missing. In cases where TCO is possible at all, can't it always be handled at the compiler level than at the opcode level? What's an example of where it can't?
Tail call optimization happens when the compiler transforms a call immediately followed by a ret into a single jmp . This transformation saves one instruction, and more importantly it eliminates the implicit push/pop from the stack done by call and ret .
This kind of function call is called a tail call, and languages like Haskell, Scala, and Scheme can avoid keeping around unnecessary stack frames in such calls. This is called tail call optimization (TCO) or tail call elimitation.
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion.
Tail call optimisation isn't in the C++ standard. Apparently, some compilers, including MS Visual Studio and GCC, do provide tail call optimisation under certain circumstances (when optimisations are enabled, obviously).
Following the links you already provided, this is the part which seems to me, answers your question pretty closely..
Source
The CLR and tail calls
When you're dealing with languages managed by the CLR, there are two kinds of compilers in play. There's the compiler that goes from your language's source code down to IL (C# developers know this as csc.exe), and then there's the compiler that goes from IL to native code (the JIT 32/64 bit compilers that are invoked at run time or NGEN time). Both the source->IL and IL->native compilers understand the tail call optimization. But the IL->native compiler--which I'll just refer to as JIT--has the final say on whether the tail call optimization will ultimately be used. The source->IL compiler can help to generate IL that is conducive to making tail calls, including the use of the "tail." IL prefix (more on that later). In this way, the source->IL compiler can structure the IL it generates to persuade the JIT into making a tail call. But the JIT always has the option to do whatever it wants.
When does the JIT make tail calls?
I asked Fei Chen and Grant Richins, neighbors down the hall from me who happen to work on the JIT, under what conditions the various JITs will employ the tail call optimization. The full answer is rather detailed. The quick summary is that the JITs try to use the tail call optimization whenever they can, but there are lots of reasons why the tail call optimization can't be used. Some reasons why tail calling is a non-option:
- Caller doesn't return immediately after the call (duh :-))
- Stack arguments between caller and callee are incompatible in a way that would require shifting things around in the caller's frame before the callee could execute
- Caller and callee return different types
- We inline the call instead (inlining is way better than tail calling, and opens the door to many more optimizations)
- Security gets in the way
- The debugger / profiler turned off JIT optimizations
The most interesting part in context of your question, which makes it super clear in my opinion, among many scenarios, is example of security mentioned above...
Security in .NET in many cases depends on the stack being accurate... at runtime.. Which is why, as highlighted above, the burden is shared by both the source to CIL compiler, and (runtime) CIL-to-native JIT compilers, with the final say being with the latter.
Guess: In a simple language like x86 assembler where you manage the stack "manually", you don't need an opcode - you can just set up the call stack appropriately.
But in something higher-level like .NET CIL, the stack is partially managed for you, and the whole act of invoking a function is a single opcode (e.g. call). So you need a different opcode to implement TCO - one that does "pass control flow to this function, but without creating a new stack frame".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With