Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiler written in Java: Peephole optimizer implementation

Tags:

I'm writing a compiler for a subset of Pascal. The compiler produces machine instructions for a made-up machine. I want to write a peephole optimizer for this machine language, but I'm having trouble substituting some of the more complicated patterns.

Peephole optimizer specification

I've researched several different approaches to writing a peephole optimizer, and I've settled on a back-end approach:

  • The Encoder makes a call to an emit() function every time a machine instruction is to be generated.
  • emit(Instruction currentInstr) checks a table of peephole optimizations:
    • If the current instruction matches the tail of a pattern:
      1. Check previously emitted instructions for matching
      2. If all instructions matched the pattern, apply the optimization, modifying the tail end of the code store
    • If no optimization was found, emit the instruction as usual

Current design approach

The method is easy enough, it's the implementation I'm having trouble with. In my compiler, machine instructions are stored in an Instruction class. I wrote an InstructionMatch class stores regular expressions meant to match each component of a machine instruction. Its equals(Instruction instr) method returns true if the patterns match some machine instruction instr.

However, I can't manage to fully apply the rules I have. First off, I feel that given my current approach, I'll end up with a mess of needless objects. Given that a complete list of peephole optimizations numbers can number around 400 patterns, this will get out of hand fast. Furthermore, I can't actually get more difficult substitutions working with this approach (see "My question").

Alternate approaches

One paper I've read folds previous instructions into one long string, using regular expressions to match and substitute, and converting the string back to machine instructions. This seemed like a bad approach to me, please correct me if I'm wrong.

Example patterns, pattern syntax

x: JUMP x+1; x+1: JUMP y  -->  x: JUMP y
LOADL x; LOADL y; add     -->  LOADL x+y
LOADA d[r]; STOREI (n)    -->  STORE (n) d[r]

Note that each of these example patterns is just a human-readable representation of the following machine instruction template:

op_code register n d

(n usually indicates the number of words, and d an address displacement). The syntax x: <instr> indicates that the instruction is stored at address x in the code store.

So, the instruction LOADL 17 is equivalent to the full machine instruction 5 0 0 17 when the LOADL opcode is 5 (n and r are unused in this instruction)

My question

So, given that background, my question is this: How do I effectively match and replace patterns when I need to include parts of previous instructions as variables in my replacement? For example, I can simply replace all instances of LOADL 1; add with the increment machine instruction - I don't need any part of the previous instructions to do this. But I'm at a loss of how to effectively use the 'x' and 'y' values of my second example in the substitution pattern.

edit: I should mention that each field of an Instruction class is just an integer (as is normal for machine instructions). Any use of 'x' or 'y' in the pattern table is a variable to stand in for any integer value.

like image 617
David Cain Avatar asked May 10 '12 21:05

David Cain


1 Answers

An easy way to do this is to implement your peephole optimizer as a finite state machine.

We assume you have a raw code generator that generates instructions but does not emit them, and an emit routine that sends actual code to the object stream.

The state machine captures instructions that your code generator produces, and remembers sequences of 0 or more generated instructions by transitioning between states. A state thus implicitly remembers a (short) sequence of generated but un-emitted instructions; it also has to remember the key parameters of the instructions it has captured, such as a register name, a constant value, and/or addressing modes and abstract target memory locations. A special start state remembers the empty string of instructions. At any moment, you need to be able to emit the unemitted instructions ("flush"); if you do this all the time, your peephole generator captures the next instruction and then emits it, doing no useful work.

To do useful work, we want the machine to capture as long a sequence as possible. Since there are typically many kinds of machine instructions, as practical matter you can't remember too many in a row or the state machine will become enormous. But it is practical to remember the last two or three for the most common machine instructions (load, add, cmp, branch, store). The size of the machine will really be determined by lenght of the longest peephole optimization we care to do, but if that length is P, the entire machine need not be P states deep.

Each state has transitions to a next state based on the "next" instruction I produced by your code generator. Imagine a state represents the capture of N instructions. The transition choices are:

  • flush the leftmost 0 or more (call this k) instructions that this state represents, and transition to a next state, representing N-k+1, instructions that represents the additional capture of machine instruction I.
  • flush the leftmost k instructions this state represents, transition to the state that represents the remaining N-k instructions, and reprocess instruction I.
  • flush the state completely, and emit instruction I, too. [You can actually do this on the just the start state].

When flushing the k instructions, what actually gets emitted is the peephole optimized version of those k. You can compute anything you want in emitting such instructions. You also need to remember "shift" the parameters for the remaining instructions appropriately.

This is all pretty easily implemented with a peephole optimizer state variable, and a case statement at each point where your code generator produces its next instruction. The case statement updates the peephole optimizer state and implements the transition operations.

Assume our machine is an augmented stack machine, has

 PUSHVAR x
 PUSHK i
 ADD
 POPVAR x
 MOVE x,k

instructions, but the raw code generator generates only pure stack machine instructions, e.g., it does not emit the MOV instruction at all. We want the peephole optimizer to do this.

The peephole cases we care about are:

 PUSHK i, PUSHK j, ADD ==> PUSHK i+j
 PUSHK i, POPVAR x ==> MOVE x,i 

Our state variables are:

 PEEPHOLESTATE (an enum symbol, initialized to EMPTY)
 FIRSTCONSTANT (an int)
 SECONDCONSTANT (an int)

Our case statements:

GeneratePUSHK:
    switch (PEEPHOLESTATE) {
        EMPTY: PEEPHOLESTATE=PUSHK;
               FIRSTCONSTANT=K;
               break;
        PUSHK: PEEPHOLESTATE=PUSHKPUSHK;
               SECONDCONSTANT=K;
               break;
        PUSHKPUSHK:
        #IF consumeEmitLoadK // flush state, transition and consume generated instruction
               emit(PUSHK,FIRSTCONSTANT);
               FIRSTCONSTANT=SECONDCONSTANT;
               SECONDCONSTANT=K;
               PEEPHOLESTATE=PUSHKPUSHK;
               break;
        #ELSE // flush state, transition, and reprocess generated instruction
               emit(PUSHK,FIRSTCONSTANT);
               FIRSTCONSTANT=SECONDCONSTANT;
               PEEPHOLESTATE=PUSHK;
               goto GeneratePUSHK;  // Java can't do this, but other langauges can.
        #ENDIF
     }

  GenerateADD:
    switch (PEEPHOLESTATE) {
        EMPTY: emit(ADD);
               break;
        PUSHK: emit(PUSHK,FIRSTCONSTANT);
               emit(ADD);
               PEEPHOLESTATE=EMPTY;
               break;
        PUSHKPUSHK:
               PEEPHOLESTATE=PUSHK;
               FIRSTCONSTANT+=SECONDCONSTANT;
               break:
     }  

  GeneratePOPX:
    switch (PEEPHOLESTATE) {
        EMPTY: emit(POP,X);
               break;
        PUSHK: emit(MOV,X,FIRSTCONSTANT);
               PEEPHOLESTATE=EMPTY;
               break;
        PUSHKPUSHK:
               emit(MOV,X,SECONDCONSTANT);
               PEEPHOLESTATE=PUSHK;
               break:
     }

GeneratePUSHVARX:
    switch (PEEPHOLESTATE) {
        EMPTY: emit(PUSHVAR,X);
               break;
        PUSHK: emit(PUSHK,FIRSTCONSTANT);
               PEEPHOLESTATE=EMPTY;
               goto GeneratePUSHVARX;
        PUSHKPUSHK:
               PEEPHOLESTATE=PUSHK;
               emit(PUSHK,FIRSTCONSTANT);
               FIRSTCONSTANT=SECONDCONSTANT;
               goto GeneratePUSHVARX;
     }

The #IF shows two different styles of transitions, one that consumes the generated instruction, and one that does not; either works for this example. When you end up with a few hundred of these case statements, you'll find both types handy, with the "don't consume" version helping you keep your code smaller.

We need a routine to flush the peephole optimizer:

  flush() {
    switch (PEEPHOLESTATE) {
        EMPTY: break;
        PUSHK: emit(PUSHK,FIRSTCONSTANT);
               break;
        PUSHKPUSHK:
               emit(PUSHK,FIRSTCONSTANT),
               emit(PUSHK,SECONDCONSTANT),
               break:
      }
      PEEPHOLESTATE=EMPTY;
      return; }

It is interesting to consider what this peephole optimizer does with the following generated code:

      PUSHK  1
      PUSHK  2
      ADD
      PUSHK  5
      POPVAR X
      POPVAR Y

What this whole FSA scheme does is hide your pattern matching in the state transitions, and the response to matched patterns in the cases. You can code this by hand, and it is fast and relatively easy to code and debug. But when the number of cases gets large, you don't want to build such a state machine by hand. You can write a tool to generate this state machine for you; good background for this would be FLEX or LALR parser state machine generation. I'm not going to explain this here :-}

like image 143
Ira Baxter Avatar answered Oct 22 '22 15:10

Ira Baxter