I've been digging into IL recently, and I noticed some odd behavior of the C# compiler. The following method is a very simple and verifiable application, it will immediately exit with exit code 1:
static int Main(string[] args)
{
return 1;
}
When I compile this with Visual Studio Community 2015, the following IL code is generated (comments added):
.method private hidebysig static int32 Main(string[] args) cil managed
{
.entrypoint
.maxstack 1
.locals init ([0] int32 V_0) // Local variable init
IL_0000: nop // Do nothing
IL_0001: ldc.i4.1 // Push '1' to stack
IL_0002: stloc.0 // Pop stack to local variable 0
IL_0003: br.s IL_0005 // Jump to next instruction
IL_0005: ldloc.0 // Load local variable 0 onto stack
IL_0006: ret // Return
}
If I were to handwrite this method, seemingly the same result could be achieved with the following IL:
.method static int32 Main()
{
.entrypoint
ldc.i4.1 // Push '1' to stack
ret // Return
}
Are there underlying reasons that I'm not aware of that make this the expected behaviour?
Or is just that the assembled IL object code further optimized down the line, so the C# compiler does not have to worry about optimization?
C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.
It was mainly developed as a system programming language to write an operating system. The main features of the C language include low-level memory access, a simple set of keywords, and a clean style, these features make C language suitable for system programmings like an operating system or compiler development.
C is a high-level and general-purpose programming language that is ideal for developing firmware or portable applications. Originally intended for writing system software, C was developed at Bell Labs by Dennis Ritchie for the Unix Operating System in the early 1970s.
C is one of the most easy to learn languages of all. It has 30 primitives it is easy and the syntax graph does fit on a single piece of paper, which was provided with the pre ANSI editions of the Kernighan & Richie.
The output you've shown is for a debug build. With a release build (or basically with optimizations turned on) the C# compiler generates the same IL you'd have written by hand.
I strongly suspect that this is all to make the debugger's work easier, basically - to make it simpler to break, and also see the return value before it's returned.
Moral: when you want to run optimized code, make sure you're not asking the compiler to generate code that's aimed at debugging :)
Jon's answer is of course correct; this answer is to follow up on this comment:
@EricLippert the local makes perfect sense, but is there any rationale for that br.s instruction, or is it just there out of convenience in the emitter code? I guess that if the compiler wanted to insert a breakpoint placeholder there, it could just emit a nop...
The reason for the seemingly senseless branch becomes more sensible if you look at a more complicated program fragment:
public int M(bool b) {
if (b)
return 1;
else
return 2;
}
The unoptimized IL is
IL_0000: nop
IL_0001: ldarg.1
IL_0002: stloc.0
IL_0003: ldloc.0
IL_0004: brfalse.s IL_000a
IL_0006: ldc.i4.1
IL_0007: stloc.1
IL_0008: br.s IL_000e
IL_000a: ldc.i4.2
IL_000b: stloc.1
IL_000c: br.s IL_000e
IL_000e: ldloc.1
IL_000f: ret
Notice that there are two return
statements but only one ret
instruction. In unoptimized IL, the pattern for codegen'ing a simple return statement is:
That is, the unoptimized code uses single-point-of-return form.
In both this case and the simple case shown by the original poster, that pattern causes a "branch to next" situation to be generated. The "remove any branch to next" optimizer does not run when generating unoptimized code, so it remains.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With