Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x86 instruction encoding tables

I'm in middle of rewriting my assembler. While at it I'm curious about implementing disassembly as well. I want to make it simple and compact, and there's concepts I can exploit while doing so.

It is possible to determine rest of the x86 instruction encoding from opcode (maybe prefix bytes are required too, a bit). I know many people have written tables for doing it.

I'm not interested about mnemonics but instruction encoding, because it is an actual hard problem there. For each opcode number I need to know:

  • does this instruction contain modrm?
  • how many immediate fields does this instruction have?
  • what encoding does an immediate use?
  • is the immediate in field an instruction pointer -relative address?
  • what kind of registers does the modrm use for operand and register fields?

sandpile.org has somewhat quite much what I'd need, but it's in format that isn't easy to parse.

Before I start writing and validating those tables myself, I decided to write this question. Do you know about this kind of tables existing somewhere? In a form that doesn't require too much effort to parse.

b   byte
w   word
v   word or dword (or qword), depends on operand size attribute (0x66)
z   word or dword (or dword), depends on operand size attribute
J   instruction-relative address (next character describes type)
G   instruction group, has modrm-field (next character describes operand type)
R   has modrm-field (next two characters describe register and operand type)
M   modrm, but operand field must point to memory
O   direct offset (next character describes type)
F   FPU
T   separate table
_   defined, but no arguments

x    0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
0  Rbb  Rvv  Rbb  Rvv    b    z            Rbb  Rvv  Rbb  Rvv    b    z         T
1  Rbb  Rvv  Rbb  Rvv    b    z            Rbb  Rvv  Rbb  Rvv    b    z
2  Rbb  Rvv  Rbb  Rvv    b    z            Rbb  Rvv  Rbb  Rvv    b    z
3  Rbb  Rvv  Rbb  Rvv    b    z            Rbb  Rvv  Rbb  Rvv    b    z
4    _    _    _    _    _    _    _    _    _    _    _    _    _    _    _    _
5    _    _    _    _    _    _    _    _    _    _    _    _    _    _    _    _
6    _    _  Mvv                             z Rvvz    b Rvvb
7   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb   Jb
8  Gbb  Gvz  Gbb  Gvb  Rbb  Rvv  Rbb  Rvv  Rbb  Rvv  Rbb  Rvv       Mvv
9    _    _    _    _    _    _    _    _                        _    _    _    _
A   Ob   Ov   Ob   Ov    _    _    _    _    b    z    _    _    _    _    _    _
B    b    b    b    b    b    b    b    b    v    v    v    v    v    v    v    v
C  Gbb  Gvb    w    _                                            _    b    _    _
D   Gb   Gv   Gb   Gv                        F    F    F    F    F    F    F    F
E                                           Jz   Jz        Jb
F                        _    _   Gb   Gv    _    _    _    _    _    _   Gb   Gv

Here I've got the table for first operand. The format is such that the table can be parsed straight out from a text file that contains it. I left away some CISC and segmentation related instructions.

For two-byte instructions the chances are I need four such tables. For three-byte instructions I'll need two tables more. FPU instructions require 8 tables, which are fortunately very simple. After that I'd have pretty large chunk of x86 instructions covered up. Though I go just fine with just one or two tables.

Further, few instruction groups might require some small arrays to recognise instruction type.

like image 325
Cheery Avatar asked May 18 '10 07:05

Cheery


2 Answers

I believe ref.x86asm.net might have what you're looking for. It's a list of all x86-64 instructions, in an XML format that should be easy to parse.

like image 158
Martin Avatar answered Oct 18 '22 23:10

Martin


IIRC for the internal assembler of the Free Pascal compiler, we initially used tables extracted from the NASM sources.

like image 44
Marco van de Voort Avatar answered Oct 18 '22 21:10

Marco van de Voort