Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret x86 opcode map?

In looking at an x86 opcode map such as this:

http://www.mlsite.net/8086/#tbl_map1

It defines mappings, for example:

00: ADD Eb,Gb
01: ADD Ev,Gv
...

That link has basic descriptions of what the letters mean, such as:

  • E: A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a displacement.
  • b: Byte argument.

But it's a bit too vague. How do you actually translate that into "complete opcode" (the whole instruction + args in opcode)? Haven't been able to figure it out from the Intel manuals yet either, maybe I'm looking in the wrong place (and it's a bit overwhelming)? Seeing a snippet showing the output opcode for an input instruction (and how you did that) would be super helpful.

like image 442
Lance Avatar asked Dec 11 '22 00:12

Lance


2 Answers

By all means, use the intel manuals. For each instruction it gives the machine code and chapter 2 has a very detailed description on the instruction format.

But to give you a walkthrough, let's see ADD EDX, [EBX+ECX*4+15h]. First we read through the chapters 2 INSTRUCTION FORMAT and 3.1 INTERPRETING THE INSTRUCTION REFERENCE PAGES to get an idea of what we will see. We are especially interested in the abbreviations listed at 3.1.1.3 Instruction Column in the Opcode Summary Table.

Armed with that information, we turn to the page describing the ADD instruction and try to identify an appropriate version for the one we want to encode. Our first operand is a 32 bit register and the second is a 32 bit memory location, so let's see what matches that. It's going to be the penultimate line: 03 /r ADD r32, r/m32. We go back to chapter 3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX prefix) to see what that magical /r is: Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand.

Okay, so Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format showed us how the instruction will look. So far we know that we won't have any prefixes and the opcode will be 03 and we will use at least a modr/m byte. So let's go see how to figure that out. Look at Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte. The columns represent the register operand, the rows the memory operand. Since our register is EDX we use the 3rd column.

The memory operand is [EBX+ECX*4+15h] which can be encoded using a 8 or a 32 bit displacement. To get shorter code we will use the 8 bit version, so the line [--][--]+disp8 applies. This means our modr/m byte is going to be 54.

We will need a SIB byte too. Those are listed in Table 2-3. 32-Bit Addressing Forms with the SIB Byte. Since our base is EBX we use column 4, and the row for [ECX*4] which gives us our SIB byte of 8B.

Finally we add our 8 bit displacement byte, which is 15. The complete instruction is thus 03 54 8B 15. We can verify this with an assembler:

2 00000000 03548B15                add edx, [ebx+ecx*4+15h]
like image 93
Jester Avatar answered Dec 14 '22 23:12

Jester


You're looking at an opcode map that translates the first byte of an opcode in the instruction pattern that that byte matches. If you want to know about the rest of the bytes of the instruction, you need to look elsewhere.

If you look at the page for the ADD instruction, it will show you something like:

00 /r        ADD r/m8, r8

this tells you that the 00 byte is followed by a ModR/M byte that contains the register r in the register field and that register is an 8-bit register that is the second operand of the ADD instruction (the r8 in the instruction pattern), while the first operand is in the rest of the ModR/M byte

Now if you go look at the documentation for ModR/M bytes, it will tell you that a ModR/M byte has 3 fields -- a 2-bit 'mod' field, a 3-bit 'register/opcode' field and a 3-bit 'r/m' field. It then give a table of all 256 ModR/M byte values noting what the fields mean in each case. This table is (generally) organized as 32 rows of 8 columns -- the 32 rows are split into 4 groups of 8, with the groups corresponding to the 'mod' field bits and the rows within the groups to the 'r/m' field bits, while the columns correspond to the 'register/opcode' field bits. Its a litte weird as the 'mod' is the top 2 bits and the 'r/m' is the bottom 3 bits with the 'register/opcode' in the middle, but it makes sense as the 'mod' and 'r/m' bits are closely associated and go together to describe one operand, while the 'register/opcode' bits are pretty much completely independent, describing the other operand or being part of the opcode.

like image 38
Chris Dodd Avatar answered Dec 14 '22 23:12

Chris Dodd