Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Instruction Decoding in x86 architecture [closed]

I am working on a operating system project for my lab where I've to work with the instruction pointer and instruction opcode. Right now all I need to know is what type of instruction it is. For that I'm reading the data from the address pointed by instruction pointer. The first byte from this data gives me the instruction type. For example if first byte is 0xC6 it is a MOVB instruction. Now there are some cases when the first byte of instruction pointer is 0x0F. According to documentation 0x0F which means it is a two byte instruction. My problem is with this type of instruction. I'm not sure how to find out the instruction type for two byte instruction.

After that my 2nd priority is two find out the operands of the instruction. I've no knowledge of doing that from code. Any sample code will be appreciated

Third comes the need to find out the size of the instruction. As x86 is variable length, I want to know the size of each instructions. At first I planned to use a look up table where I'll maintain the instruction name and its size. But then I discovered that the same instruction can have variable length. For example when I used object dump on a .o file I found two instruction C6 00 62 which is for MOVB $0x62,(%EAX) & C6 85 2C FF FF FF 00 which is for MOVB $0x0,-0xD4(%EBP). Look here both instruction type is same(C6) but the are of different length.

So I'm in need of answers to those questions. It'll be highly appreciated if someone can give me some solutions.

like image 449
azizulhakim Avatar asked Dec 06 '22 04:12

azizulhakim


1 Answers

Basically what you need is set of nested case statements, implementing a finite state machine scanner, where each level inspects some byte (typically left to right) of the opcode to determine what it does.

Your top level case statement will pretty much be 256 cases, one for each opcode byte; you'll find some of the opcodes (especially the so-called "prefix" bytes) cause the top level to loop (picking up multiple prefix bytes the precede main opcode byte). Sub cases will acquire structure according the opcode structure of the x86; you'll almost certainly end up with a MODRM and SIB addressing mode byte decoders/subroutines.

I've done this; the work is annoying because of details but not hard. You can get a pretty good solution in several hundred lines of code if you are careful. If you insist on doing the whole instruction set (vector registers and opcodes, esp. for haswell etc.) you're likely to end up with something bigger; Intel has been jamming instructions into every dark corner they can find.

You really need an opcode map; I'm pretty sure there is one in the Intel manuals. I've found this link to be pretty useful: http://www.ref.x86asm.net/coder32.html

EDIT Sept 2015: Here at SO I provide C code that implements this: https://stackoverflow.com/a/23843450/120163

like image 171
Ira Baxter Avatar answered Dec 14 '22 06:12

Ira Baxter