Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this very simple C# method produce such illogical CIL code?

I've been digging into IL recently, and I noticed some odd behavior of the C# compiler. The following method is a very simple and verifiable application, it will immediately exit with exit code 1:

static int Main(string[] args)
{
    return 1;
}

When I compile this with Visual Studio Community 2015, the following IL code is generated (comments added):

.method private hidebysig static int32 Main(string[] args) cil managed
{
  .entrypoint
  .maxstack  1
  .locals init ([0] int32 V_0)     // Local variable init
  IL_0000:  nop                    // Do nothing
  IL_0001:  ldc.i4.1               // Push '1' to stack
  IL_0002:  stloc.0                // Pop stack to local variable 0
  IL_0003:  br.s       IL_0005     // Jump to next instruction
  IL_0005:  ldloc.0                // Load local variable 0 onto stack
  IL_0006:  ret                    // Return
}

If I were to handwrite this method, seemingly the same result could be achieved with the following IL:

.method static int32 Main()
{
  .entrypoint
  ldc.i4.1               // Push '1' to stack
  ret                    // Return
}

Are there underlying reasons that I'm not aware of that make this the expected behaviour?

Or is just that the assembled IL object code further optimized down the line, so the C# compiler does not have to worry about optimization?

like image 958
lpmitchell Avatar asked Jan 25 '18 14:01

lpmitchell


People also ask

Is C the easiest language?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Why do we write C?

It was mainly developed as a system programming language to write an operating system. The main features of the C language include low-level memory access, a simple set of keywords, and a clean style, these features make C language suitable for system programmings like an operating system or compiler development.

What is C in simple words?

C is a high-level and general-purpose programming language that is ideal for developing firmware or portable applications. Originally intended for writing system software, C was developed at Bell Labs by Dennis Ritchie for the Unix Operating System in the early 1970s.

Why C is a simple language?

C is one of the most easy to learn languages of all. It has 30 primitives it is easy and the syntax graph does fit on a single piece of paper, which was provided with the pre ANSI editions of the Kernighan & Richie.


2 Answers

The output you've shown is for a debug build. With a release build (or basically with optimizations turned on) the C# compiler generates the same IL you'd have written by hand.

I strongly suspect that this is all to make the debugger's work easier, basically - to make it simpler to break, and also see the return value before it's returned.

Moral: when you want to run optimized code, make sure you're not asking the compiler to generate code that's aimed at debugging :)

like image 70
Jon Skeet Avatar answered Oct 18 '22 00:10

Jon Skeet


Jon's answer is of course correct; this answer is to follow up on this comment:

@EricLippert the local makes perfect sense, but is there any rationale for that br.s instruction, or is it just there out of convenience in the emitter code? I guess that if the compiler wanted to insert a breakpoint placeholder there, it could just emit a nop...

The reason for the seemingly senseless branch becomes more sensible if you look at a more complicated program fragment:

public int M(bool b) {
    if (b) 
      return 1; 
    else 
      return 2;
}

The unoptimized IL is

    IL_0000: nop
    IL_0001: ldarg.1
    IL_0002: stloc.0
    IL_0003: ldloc.0
    IL_0004: brfalse.s IL_000a
    IL_0006: ldc.i4.1
    IL_0007: stloc.1
    IL_0008: br.s IL_000e
    IL_000a: ldc.i4.2
    IL_000b: stloc.1
    IL_000c: br.s IL_000e
    IL_000e: ldloc.1
    IL_000f: ret

Notice that there are two return statements but only one ret instruction. In unoptimized IL, the pattern for codegen'ing a simple return statement is:

  • stuff the value you're going to return into a stack slot
  • branch/leave to the end of the method
  • at the end of the method, read the value out of the slot and return

That is, the unoptimized code uses single-point-of-return form.

In both this case and the simple case shown by the original poster, that pattern causes a "branch to next" situation to be generated. The "remove any branch to next" optimizer does not run when generating unoptimized code, so it remains.

like image 36
Eric Lippert Avatar answered Oct 17 '22 23:10

Eric Lippert