Does adding virtual to a C# method may break legacy clients?

Question

The question is very straightforward,

if I have a following class:

public class ExportReservationsToFtpRequestOld 
{
    public int A { get; set; }
    public long B { get; set; }
}

and change it to:

public class ExportReservationsToFtpRequestOld 
{
    public virtual int A { get; set; }
    public virtual long B { get; set; }
}

can it break a legay client dll?

Eric Lippert · Accepted Answer

The answer from Dai is good but a little hard to read and it buries the lede. Let's not bury the lede.

can making a non-virtual instance method virtual break a legacy client dll?

Yes. The breakage is subtle and unlikely, but it is possible. Legacy clients should be recompiled when you make a member of a dependency virtual.

More generally: if you are changing anything about a base class's public or protected surface area, recompile all the assemblies that make derived classes.

Let's see how this specific scenario can break a legacy client. Suppose we have a dependency assembly with:

public class B {
  public void M() { }
}

and then we use this in a client assembly:

class C {
  static void Q() {
    B b = new B();
    b.M();
  }
}

What IL is generated?

    newobj instance void B::.ctor()
    callvirt instance void B::M()
    ret

Perfectly reasonable code. C# generates a callvirt to a non-virtual call because that means we do not have to emit a check to ensure that the receiver is non-null. This keeps the code small.

If we update B.M to be virtual, the call site does not need to change; it is already doing a virtual call. So everything is fine, right?

Now, suppose before the new version of the dependency ships, some super genius comes along and says oh, we can refactor this code into the obviously better code:

  static void Q() {
    new B().M();
  }

Surely that refactoring changes nothing, right?

WRONG. The code generated is now:

  newobj instance void B::.ctor()
  call instance void B::M()
  ret

C# reasons "I'm doing a call to a non-virtual method and I know the receiver is a new expression which never produces null, so I'll save that nanosecond and skip the null check".

Why not do that in the first case? Because C# does not do the control flow analysis in the first version and deduce that on every control flow, the receiver is known to be non-null. It just does a cheap check to see if the receiver is one of a handful of expressions known to be impossible to be null.

If you now change the dependency B.M to a virtual method and do not recompile the assembly with the call site, the code in the call site is now not verifiable because it violates the CLR security rules. A non-virtual call to a virtual method is only legal when the call is directly in a member of a derived type.

See my comment on another answer for a scenario that motivates this security design decision.

Aside: the rule applies even to nested types! That is, if we have class D : B { class N { } } then code inside N is not allowed to do a non-virtual call to a virtual member of B, though code inside D is!

So already we have a problem; we're turning verifiable code in another assembly we don't own into nonverifiable code.

But wait, it gets worse.

Suppose we have a slightly different scenario. I suspect this is the scenario actually motivating your change.

// Original code
public class B {
  public void M() {}
}
public class D : B { }

And a client

class C {
  static void Q() {
    new D().M();
  }
}

What code is generated now? The answer might surprise you. It is the same as before. C# does NOT generate

  call instance void D::M()

Rather it generates

  call instance void B::M()

Because after all, that's the method being called.

And now we change the dependency to

// New code
class B {
  public virtual void M() {}
}
class D : B { 
  public override void M() {}
}

The authors of the new code reasonably believe that all calls to new D().M() should be dispatched to D.M, but as we see, the not-recompiled client will still be doing an unverifiable non-virtual dispatch to B.M! So this is a not-breaking change in the sense that the client still gets the behaviour they used to get (assuming they ignore the verification failure) but that behaviour is no longer correct, and will change upon recompilation.

The basic problem here is that a non-virtual call can show up where you don't expect it, and then if you change the requirement to make the call virtual, that doesn't happen until recompilation.

Let's look at another version of the scenario we just did. In the dependency we have as before

public class B { public void M() {} }
public class D : B {}

and in our client we now have:

interface I { void M(); }
class C : D, I {}
...
I i = new C();
i.M();

All is good; C inherits from D, which gives it a public member B.M which implements I.M and we're all set.

Except that there's a problem. The CLR requires that a method B.M that implements I.M be virtual, and B.M is not. Rather than rejecting this program, C# pretends that you wrote:

class C : D, I 
{
  void I.M()
  {
    base.M();
  }
}

Where base.M() is compiled as non-virtual call to B.M(). After all, we know that this is non-null and B.M() is not virtual, so we can do a call instead of a callvirt.

But now what happens when we recompile the dependency without recompiling the client:

class B {
  public virtual void M() {}
}
class D : B { 
  public override void M() {}
}

Now calling i.M() will do a verifiable non-virtual call to B.M but the author of D.M expected that D.M would be called in this scenario, and on re-compilation of the client, it will be.

Finally, there are potentially more scenarios involving explicit base. calls where changing a dependency "in the middle" of a class hierarchy can produce bizarre unexpected results. See https://blogs.msdn.microsoft.com/ericlippert/2010/03/29/putting-a-base-in-the-middle/ for details of that scenario. This is not your scenario, but it further illustrates the dangers of non-virtual calls to virtual methods.

Dai · Answer

C# is compiled to CIL (formerly known as MSIL).
Property accesses and assignments are compiled to method calls:
- value = foo.Bar becomes value = foo.get_Bar().
- foo.Bar = value becomes foo.set_Bar( value ).
Method calls are compiled to call or callvirt opcodes.
The call and callvirt opcodes' first operand is a "symbol"/identifier by-name, so changing your class's members from non-virtual to virtual will not break JIT compilation.
call and callvirt can both be used to call virtual and non-virtual methods with differing behaviour, and the opcodes are chosen by the compiler for various reasons, and importantly the compiler may use callvirt to call a non-virtual method ( http://www.levibotelho.com/development/call-and-callvirt-in-cil/ )
- .call NonVirtualMethod
  - The NonVirtualMethod is directly invoked.
- .callvirt NonVirtualMethod
  - The NonVirtualMethod is directly invoked (with a null-check).
- .call VirtualMethod
  - The VirtualMethod is directly invoked, even if the current object overrides it.
- .callvirt VirtualMethod
  - The current object's override of VirtualMethod is invoked.

So while a compiled application will still start and JIT after swapping your old binary assembly for a new one with virtual members there are still some scenarios to consider regarding the behavior of the consumer of your assembly depending on what opcode (call or callvirt) the consumer's compiler used:

A consumer binary-assembly has .call ExportReservationsToFtpRequestOld::get_A

Provided you don't pass any subclass of ExportReservationsToFtpRequestOld with overridden members to the consumer, then the correct property will be invoked. If you do pass a subclass with overridden virtual members then the overridding version will not be invoked:

It is valid to call a virtual method using call (rather than callvirt); this indicates that the method is to be resolved using the class specified by method rather than as specified dynamically from the object being invoked.

(I'm surprised that C# doesn't let consumer types do this explicitly, only classes in an inheritance tree can use the base keyword).
A consumer binary-assembly has .callvirt ExportReservationsToFtpRequestOld::get_A

If the consumer is working with a subclass then the subclass's override of get_A will be invoked, and not necessarily ExportReservationsToFtpRequestOld's version.
A consumer already subclasses ExportReservationsToFtpRequestOld and adds shadows (new) versions of get_A and get_B, then calls those properties:
```
class Derived : ExportReservationsToFtpRequestOld {

    public new int A { get; set; }
    public new long B { get; set; }
}
```
or even:
```
class Derived : ExportReservationsToFtpRequestOld {

    public new virtual int A { get; set; }
    public new virtual long B { get; set; }
}

// with:

class Derived2 : Derived {

    public override int A { get; set; }
    public override long B { get; set; }
}
```
As Derived's members have different internal identifiers then ExportReservationsToFtpRequestOld's get_A and get_B will not be invoked. Even if the consumer's compiler used .callvirt instead of .call, the virtual-method lookup will start with their subclass, not ExportReservationsToFtpRequestOld. Things get complicated with Derived2 however and what happens depends on how it's consumed, see here: what is "public new virtual void Method()" mean?

TL;DR:

If you're sure no-one is deriving from ExportReservationsToFtpRequestOld with shadow+virtual members then go ahead and change it to virtual - you won't break anything.

Does adding virtual to a C# method may break legacy clients?

Tags:

c#

dll

Ricardo Alves

2 Answers

Eric Lippert

TL;DR:

Dai

Recent Activity

Donate For Us

Does adding virtual to a C# method may break legacy clients?

Tags:

c#

dll

Ricardo Alves

2 Answers

Eric Lippert

TL;DR:

Dai

Related questions

Recent Activity

Donate For Us