Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is a relative JMP (x86) implemented in an Assembler?

While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction:

OPCODE   INSTRUCTION   SIZE
 EB cb     JMP rel8     2
 E9 cw     JMP rel16    4 (because of 0x66 16-bit prefix)
 E9 cd     JMP rel32    5
 ...

(from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147)

All are relative jumps, where the size of each encoding (operation + operand) is in the third column.

Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops.

Now my problem is with the following situation:

b: XXX
c: JMP a
e: XXX
   ...
   XXX
d: JMP b
a: XXX      (where XXX is any instruction, depending
             on the to-be assembled program)

The problem is that I want the smallest possible encoding for a JMP instruction (and no NOP filling).

I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a.

How do existing assemblers solve this problem, or how would you do this?

This is what I am thinking which solves the problem:

First encode all the instructions to opcodes between the JMP and it's target, if this region contains a variable-sized opcode, use the maximum size, e.g. 5 for a JMP. Then encode the relative JMP to it's target, by choosing the smallest possible encoding size (3, 4 or 5) and calculate the distance. If any variable-sized opcode is encoded, change all absolute operands before, and all relative instructions that skips over this encoded instruction: they are re-encoded when their operand changes to choose the smallest possible size. This method is guaranteed to end, as variable-sized opcodes only may shrink (because it uses the maximum size of them).

I wonder, perhaps this is an over-engineered solution, that's why I ask this question.

like image 643
Pindatjuh Avatar asked May 11 '10 21:05

Pindatjuh


2 Answers

Here's one approach I've used that may seem inefficient but turns out not to be for most real-life code (pseudo-code):

IP := 0;
do
{
  done = true;
  while (IP < length)
  {
    if Instr[IP] is jump
      if backwards
      { Target known
          Encode short/long as needed }
      else
      {  Target unknown
          if (!marked as needing long encoding) // see below
            Encode short
          Record location for fixup }
    IP++;
  }
  foreach Fixup do
    if Jump > short
      Mark Jump location as requiring long encoding
      PC := FixupLocation; // restart at instruction that needs size change
      done = false; 
      break; // out of foreach fixup
    else
      encode jump
} while (!done);
like image 39
500 - Internal Server Error Avatar answered Jan 03 '23 13:01

500 - Internal Server Error


In the first pass you will have a very good approximation to which jmp code to use using a pessimistic byte counting for all jump instructions.

On the second pass you can fill in the jumps with the pessimistic opcode chosen. Very few jumps could then be rewritten to use a byte or two less, only those that were very close to the 8/16 bit or 16/32 byte jump threshold originally. As the candidates are all jumps of many bytes, they are less likely to be in critical loop situations so you are likely to find that further passes offer little or no benefit over a two pass solution.

like image 188
CB Bailey Avatar answered Jan 03 '23 12:01

CB Bailey