In the x86 instruction set the the bit at index 1 of an opcode can either be the direction bit which specifies what the destination and source operands are or it can be a sign extend bit.
e.g. for add
00 /r ADD r/m8, r8
versus 02 /r ADD r8, r/m8
r/m, reg
vs. reg, r/m
for the same mnemonic81 /0 id ADD r/m32, imm32
versus 83 /0 ib ADD r/m32, imm8
I'm wondering what's the easiest logical way to determine which of these cases it is. Is there a way to check other than checking the instruction opcodes and comparing them to find out which it is (for the sign extend or direction bit variants of the instructions)? There are also instructions that disregard this bit but since it's set to 0 then it doesn't really matter.
EDIT: Turns out that for write faults (which is what my code was intended for), reg->r/m is always the case because a r/m->reg instruction will never trigger a write fault. But any information would still be nice in case someone else is running into a similar issue.
[Comment made into answer].
You obviously need a boolean formula over the stream of instruction bytes. I wouldn't know how to define that formula easily; the x86 has a really messy instruction set. I'd expect the key trick is to lookup the opcode byte in a table determined by the prefix bytes. If you are writing some kind of disassembler, I'd expect you to have such tables already anyway.
The direction and sign bits are part of the flags register of the x86 processors. Since the lowest eight bits of the flags have the same layout as the flags of the 8080/8085/Z80 my guess is that the bit at index 1 is the signed bit. The position of the direction bit has not changed since it was introduced with the 8086/88 processors in the late 70s if my memory serves me.
The sign bit bit is modified as a result of an arithmetic operation and is a copy of the highest bit of the operation's result. INC and and DEC do not affect the sign bit.
The direction bit is manipulated using the cld/std instruction and controls whether the block instructions (cmps, ins, lods, movs, outs, scas and stos) post-increment/-decrement.
They may also be manipulated via the stack (though this is perhaps not meaningful with the sign bit)
pushf
and dword ptr [esp],SOME_MASK
popf
Using "and" is an example: or, xor and others may also be used.
If you manipulate the flag you may have to restore it to its previous value as some run-time libraries assume that it isn't modified.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With