Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiler generated unexpected `IN AL, DX` (opcode `EC`) while setting up call stack

I was looking at some compiler output, and when a function is called it usually starts setting up the call stack like so:

PUSH EBP
MOV EBP, ESP
PUSH EDI
PUSH ESI
PUSH EBX

So we save the base pointer of the calling routine on the stack, move our own base pointer up, and then store the contents of a few registers on the stack. These are then restored to their original values at the end of the routine, like so:

LEA ESP, [EBP-0Ch]
POP EBX
POP ESI
POP EDI
POP EBP
RET

So far, so good. However, I noticed that in one routine the code that sets up the call stack looks a little different. In fact, it looks like this:

IN AL, DX
PUSH EDI
PUSH ESI
PUSH EBX

This is quite confusing for a number of reasons. For one thing, the end-of-method code is identical to that quoted above for the other method, and in particular seems to expect a saved copy of EBP to be available on the stack.

For another, if I understand correctly the command IN AL, DX reads into the AL register, which is the same as the EAX register, and as it so happens the very next command here is

XOR EAX, EAX

as the program wants to zero a few things it allocated on the stack.

Question: I'm wondering exactly what's going on here that I don't understand. The machine code being translated as IN AL, DX is the single byte EC, whereas the pair of instructions PUSH EBP MOV EBP, ESP would correspond to three byte 55 88 EC. Is the disassembler misreading this somehow? Or is something relying on a side effect I don't understand?


If anyone's curious, this machine code was generated by the CLR's JIT compiler, and I'm viewing it with the Visual Studio debugger. Here's a minimal reproduction in C#:

class C {
  string s = "";
  public void f(string s) {
    this.s = s;
  }
}

However, note that this seems to be non-deterministic; sometimes I seem to get the IN AL, DX version, while other times there's a PUSH EBP followed by a MOV EBP, ESP.


EDIT: I'm starting to strongly suspect a disassembler bug -- I just got another situation where it shows IN AL, DX (opcode EC) and the two preceding bytes in memory are 55 88. So perhaps the disassembler is simply confused about the entry point of the method. (Though I'd still like some insight as to why that's happening!)

like image 532
Daniel McLaury Avatar asked Feb 17 '17 21:02

Daniel McLaury


1 Answers

Sounds like you are using VS2015. Your conclusion is correct, its debugging engine has a lot of bugs. Yes, wrong address. Not the only problem, it does not restore breakpoints properly and you are apt to see the INT3 instruction still in the code. And it can't correctly refresh the disassembly when the jitter has re-generated the code and replace stub calls. You can't trust anything you see.

I recommend you use Tools > Options > Debugging > General and tick the "Use Managed Compatibility Mode" checkbox. That forces the debugger to use an older debugging engine, VS2010 vintage. It is much more stable.

You'll lose some features with this engine, like return value inspection and 64-bit Edit+Continue. Won't be missed when you do this kind of debugging. You will however see fake code addresses, as was always common before, so all CALL addresses are wrong and you can't easily identify calls into the CLR. Flipping the engine back-and-forth is a workaround of sorts, but of course a big annoyance.

This has not been worked on either, I saw no improvements in the Updates. But they no doubt had a big bug list to work through, VS2015 shipped before it was done. Hopefully VS2017 is better, we'll find out soon.

like image 188
Hans Passant Avatar answered Sep 21 '22 12:09

Hans Passant