Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x86 reserved EFLAGS bit 1 == 0: how can this happen?

I'm using the Win32 API to stop/start/inspect/change thread state. Generally works pretty well. Sometimes it fails, and I'm trying to track down the cause.

I have one thread that is forcing context switches on other threads by:

thread stop
fetch processor state into windows context block
read thread registers from windows context block to my own context block
write thread registers from another context block into windows context block
restart thread

This works remarkably well... but ... very rarely, context switches seem to fail. (Symptom: my multithread system blows sky high executing a strange places with strange register content).

The context control is accomplished by:

if ((suspend_count=SuspendThread(WindowsThreadHandle))<0)
   { printf("TimeSlicer Suspend Thread failure");
      ...
   }
...
Context.ContextFlags = (CONTEXT_INTEGER | CONTEXT_CONTROL | CONTEXT_FLOATING_POINT);
if (!GetThreadContext(WindowsThreadHandle,&Context))
   {   printf("Context fetch failure");
       ...
   }

call ContextSwap(&Context); // does the context swap

if (ResumeThread(WindowsThreadHandle)<0)
   {  printf("Thread resume failure");
        ...
   }

None of the print statements ever get executed. I conclude that Windows thinks the context operations all happened reliably.

Oh, yes, I do know when a thread being stopped is not computing [e.g., in a system function] and won't attempt to stop/context switch it. I know this because each thread that does anything other-than-computing sets a thread specific "don't touch me" flag, while it is doing other-than-computing. (Device driver programmers will recognize this as the equivalent of "interrupt disable" instructions).

So, I wondered about the reliability of the content of the context block. I added a variety of sanity tests on various register values pulled out of the context block; you can actually decide that ESP is OK (within bounds of the stack area defined in the TIB), PC is in the program that I expect or in a system call, etc. No surprises here.

I decided to check that the condition code bits (EFLAGS) were being properly read out; if this were wrong, it would cause a switched task to take a "wrong branch" when its state was restored. So I added the following code to verify that the purported EFLAGS register contains stuff that only look like EFLAGS according to the Intel reference manual (http://en.wikipedia.org/wiki/FLAGS_register).

   mov        eax, Context.EFlags[ebx]  ; ebx points to Windows Context block
   mov        ecx, eax                ; check that we seem to have flag bits
   and        ecx, 0FFFEF32Ah         ; where we expect constant flag bits to be
   cmp        ecx, 000000202h         ; expected state of constant flag bits
   je         @f
   breakpoint                         ; trap if unexpected flag bit status
@@:

On my Win 7 AMD Phenom II X6 1090T (hex core), it traps occasionally with a breakpoint, with ECX = 0200h. Fails same way on my Win 7 Intel i7 system. I would ignore this, except it hints the EFLAGS aren't being stored correctly, as I suspected.

According to my reading of the Intel (and also the AMD) reference manuals, bit 1 is reserved and always has the value "1". Not what I see here.

Obviously, MS fills the context block by doing complicated things on a thread stop. I expect them to store the state accurately. This bit isn't stored correctly. If they don't store this bit correctly, what else don't they store?

Any explanations for why the value of this bit could/should be zero sometimes?

EDIT: My code dumps the registers and the stack on catching a breakpoint.

The stack area contains the context block as a local variable. Both EAX, and the value in the stack at the proper offset for EFLAGS in the context block contain the value 0244h. So the value in the context block really is wrong.

EDIT2: I changed the mask and comparsion values to

and        ecx, 0FFFEF328h         ; was FFEF32Ah where we expect flag bits to be
cmp        ecx, 000000200h   

This seems to run reliably with no complaints. Apparently Win7 doesn't do bit 1 of eflags right, and it appears not to matter.

Still interested in an explanation, but apparently this is not the source of my occasional context switch crash.

like image 790
Ira Baxter Avatar asked Apr 01 '14 05:04

Ira Baxter


1 Answers

Microsoft has a long history of squirreling away a few bits in places that aren't really used. Raymond Chen has given plenty of examples, e.g. using the lower bit(s) of a pointer that's not byte-aligned.

In this case, Windows might have needed to store some of its thread context in an existing CONTEXT structure, and decided to use an otherwise unused bit in EFLAGS. You couldn't do anything with that bit anyway, and Windows will get that bit back when you call SetThreadContext.

like image 172
MSalters Avatar answered Nov 15 '22 04:11

MSalters