Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IL & stack implementation in .net?

I wrote a simple program to examine how IL works :

void Main()
{

 int a=5;
 int b=6;
 if (a<b) Console.Write("333");
 Console.ReadLine();
}

The IL :

IL_0000:  ldc.i4.5    
IL_0001:  stloc.0     
IL_0002:  ldc.i4.6    
IL_0003:  stloc.1     
IL_0004:  ldloc.0     
IL_0005:  ldloc.1     
IL_0006:  bge.s       IL_0012
IL_0008:  ldstr       "333"
IL_000D:  call        System.Console.Write
IL_0012:  call        System.Console.ReadLine

I'm trying to understand the implemented efficiency :

  • at line #1 (IL code) it pushes the value 5 onto the stack ( 4 bytes which is int32)

  • at line #2 (IL code) it POPs from the stack into a local variable.

same goes for the next 2 lines.

and then , it loads those local variables onto the stack and THEN it evaluate bge.s.

Question #1

Why does he loads the local variables to the stack ? the values has already been in the stack. but he poped them in order to put them in a local variables . isn't it a waste ?

I mean , why the code couldn't be something like :

IL_0000:  ldc.i4.5
IL_0001:  ldc.i4.6    
IL_0002:  bge.s       IL_0004
IL_0003:  ldstr       "333"
IL_0004:  call        System.Console.Write
IL_0005:  call        System.Console.ReadLine

my sample of code is just 5 lines of code. what about 50,000,000 lines of code ? there will be plenty of extra code emitted by IL

Question #2

Looking at the code address :

enter image description here

  • where is the IL_0009 address ? isnt it supposed to be sequential ?

p.s. Im with Optimize flag on + release mode

like image 502
Royi Namir Avatar asked Dec 08 '12 11:12

Royi Namir


2 Answers

I can answer the second question easily. The instructions are variable-length. For example the ldstr "333" consists of the opcode for ldstr (at address 8) followed by the data representing the string (a reference to the string in the user string table).

Similarly with the call statements following that - you need the call opcode itself plus the information on the functions to call.

The reason the instructions for pushing small values like 4 or 6 onto the stack don't have extra data is because the values are encoded into the opcode itself.

See here for the instructions and encodings.

As to the first question, you may want to look at this blog entry by Eric Lippert, one of the C# developers, which states:

The /optimize flag does not change a huge amount of our emitting and generation logic. We try to always generate straightforward, verifiable code and then rely upon the jitter to do the heavy lifting of optimizations when it generates the real machine code.

like image 136
paxdiablo Avatar answered Sep 18 '22 11:09

paxdiablo


Why does he loads the local variables to the stack? The values has already been in the stack. But he poped them in order to put them in a local variables. Isn't it a waste?

A waste of what? You have to remember that IL (usually) isn't executed as it is, it's compiled again by the JIT compiler, which performs most of the optimizations. One of the points of using an “intermediate language” is so that optimizations can be implemented in one place: the JIT compiler and each language (C#, VB.NET, F#, …) doesn't have to implement them all over again. This is explained by Eric Lippert in his article Why IL?

Where is the IL_0009 address? Isn't it supposed to be sequential?

Let's have a look at the specification of the ldstr instruction (from ECMA-335):

III.4.16 ldstr – load a literal string

Format: 72 <T> […]

The ldstr instruction pushes a new string object representing the literal stored in the metadata as string (which is a string literal).

That reference to metadata above and the <T> mean that the byte 72 of the instruction is followed by a metadata token, which points to a table containing strings. How big is such token? From section III.1.9 of the same document:

Many CIL instructions are followed by a "metadata token". This is a 4-byte value, that specifies a row in a metadata table […]

So, in your case, the byte 72 of the instruction is at the address 0008 and the token (0x70000001 in this case, where the 0x70 byte represents the user strings table) is at addresses 0009 to 000C.

like image 36
svick Avatar answered Sep 19 '22 11:09

svick