Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does callvirt work under the hood?

Tags:

.net

clr

cil

I am trying to understand how the CLR implements reference types and polymorphism. I have referred to Don Box's Essential .Net Vol 1 which is a great help to calrify most of the stuff. But I am stuck/confused by the following issue when I tried to play around with some IL code to understand better.

I will try to explain the problem as best as I can. Consider the following code

class Base
{
    public void m()
    {
        Console.WriteLine("Base.m");
    }
}
class Derived : Base
{
    public void m()
    {
        Console.WriteLine("Derived.m");
    }
}

Now consider a simple console application with IL of the main method shown below. I tweaked the IL created by compiler manually to understand and assembled again with ILAsm.exe

.class private auto ansi beforefieldinit Console1.Program
       extends [mscorlib]System.Object
{
    .method private hidebysig static void  Main(string[] args) cil managed
    {
      .entrypoint
      // Code size       44 (0x2c)
      .maxstack  1
      .locals init ([0] class Console1.Base d)
      nop
      newobj     instance void Console1.Base::.ctor()
      stloc.0
      ldloc.0
      callvirt   instance void Console1.Derived::m()
      nop
      call       string [mscorlib]System.Console::ReadLine()
      pop
      ret
    } // end of method Program::Main
} // end of class Console1.Program

I was expecting this code NOT to run as the object reference is pointing to an object of Base and there is no way the method table of a base object will have an entry for the method m() defined in Derived class.

But magically this code executes the Derived.m()!!

So, there are two questions I don't understand in the above code:

  1. What is the significance of the Type specified in the below IL code? I have tried to experiment by changing this to different types (e.g System.Exception!!) and no errors are reported. Why??

    .locals init ([0] class Console1.Base d)

  2. How exactly does callvirt works? How did the call get routed to Derived.m()?

Thanks in advance!!

Regards, Ajay

like image 299
ajay Avatar asked Nov 27 '10 15:11

ajay


4 Answers

My guess is that the jitter realizes that Derived.m isn't virtual and thus can never point anywhere else. So the callvirt reduces to a null-check and a call instead of a call through the v-table.

Try making Derived.m virtual. I bet it'll throw then.

The C# compiler emits callvirt instructions even when calling a non virtual methods if it can't prove that this!=null so it gets a null-check. And the jitter is intelligent enough in that case to replace the virtual call by a normal call with a fixed address(or even inline it).

And you should check if you're code is verifiable. I think it isn't.

like image 86
CodesInChaos Avatar answered Nov 01 '22 15:11

CodesInChaos


Your code isn't verifiable (run it through peverify). I've written a blog post about how callvirt works under-the-hood that might help you understand what it does, and how your code executes.

Bear in mind that the CLR does try to execute non-verifiable code if run as a normal program; only if it actually causes a problem does it bork.

In your example, calling Derived.m() on an instance of Base works because the actual run-time binary representation of the object instances is the same; the this object is basically the same, and no instance fields of the objects are accessed.

Try putting an instance field access into both methods and see what happens...

like image 2
thecoop Avatar answered Nov 01 '22 15:11

thecoop


please note that by default, code executed from the local machine is not verified. This means that invalid code can be written and executed. I suspect your main function will not pass as-is. The PEVerify tool can check an assembly to ensure the code is type-safe, or you can enable these checks for code from the local machine or from a specific location via Security Policy Administration.

The purpose of the type in the locals statement is to declare the type of the local variable. This provides the information needed by the type verifier to verify that member accesses on the local variable are operating on an object of the correct type.

Callvirt could be implemented several ways. The most likely way is in the same way C++ vtables are implemented: An object contains a table of function pointers. Each function is located at a predefined offset in the table. To call the function, the address at the predefined offset is loaded and called. Note that in some cases, the CLR could do additional optimizations if the type of the object is known. Whether this is done, I don't know.

like image 1
Dark Falcon Avatar answered Nov 01 '22 16:11

Dark Falcon


I think this is a side-effect of a JIT compiler optimization. If the m() method was virtual, it would have to generate the machine code to dig the method table pointer out of the object, then make the virtual call. But this method isn't virtual and the JIT compiler already knows the method table pointer for the Derived class. So it bypasses the pointer retrieval and supplies it directly. Making the call work as you observed. You can verify my guess by checking the generated machine code.

Yeah, the IL verifier isn't scoring any points here. You could make it more interesting by having the Derived.m() method tinker with a field that's only declared in Derived. I've seen too much Reflection.Emit code crash with an AccessViolation to be greatly surprised by this. It however may well be intentional, no need to verify IL that crashes anyway. Not sure, exploiting these kind of verification loopholes isn't (yet) common. Thankfully.

like image 1
Hans Passant Avatar answered Nov 01 '22 15:11

Hans Passant