I'm writing a compiler for a subset of Pascal. The compiler produces machine instructions for a made-up machine. I want to write a peephole optimizer for this machine language, but I'm having trouble substituting some of the more complicated patterns.
I've researched several different approaches to writing a peephole optimizer, and I've settled on a back-end approach:
emit()
function every time a machine instruction is to be generated.emit(Instruction currentInstr)
checks a table of peephole optimizations:
The method is easy enough, it's the implementation I'm having trouble with. In my compiler, machine instructions are stored in an Instruction
class. I wrote an InstructionMatch
class stores regular expressions meant to match each component of a machine instruction. Its equals(Instruction instr)
method returns true
if the patterns match some machine instruction instr
.
However, I can't manage to fully apply the rules I have. First off, I feel that given my current approach, I'll end up with a mess of needless objects. Given that a complete list of peephole optimizations numbers can number around 400 patterns, this will get out of hand fast. Furthermore, I can't actually get more difficult substitutions working with this approach (see "My question").
One paper I've read folds previous instructions into one long string, using regular expressions to match and substitute, and converting the string back to machine instructions. This seemed like a bad approach to me, please correct me if I'm wrong.
x: JUMP x+1; x+1: JUMP y --> x: JUMP y
LOADL x; LOADL y; add --> LOADL x+y
LOADA d[r]; STOREI (n) --> STORE (n) d[r]
Note that each of these example patterns is just a human-readable representation of the following machine instruction template:
op_code register n d
(n usually indicates the number of words, and d an address displacement). The syntax x: <instr>
indicates that the instruction is stored at address x
in the code store.
So, the instruction LOADL 17
is equivalent to the full machine instruction 5 0 0 17
when the LOADL
opcode is 5 (n
and r
are unused in this instruction)
So, given that background, my question is this: How do I effectively match and replace patterns when I need to include parts of previous instructions as variables in my replacement? For example, I can simply replace all instances of LOADL 1; add
with the increment machine instruction - I don't need any part of the previous instructions to do this. But I'm at a loss of how to effectively use the 'x' and 'y' values of my second example in the substitution pattern.
edit: I should mention that each field of an Instruction
class is just an integer (as is normal for machine instructions). Any use of 'x' or 'y' in the pattern table is a variable to stand in for any integer value.
An easy way to do this is to implement your peephole optimizer as a finite state machine.
We assume you have a raw code generator that generates instructions but does not emit them, and an emit routine that sends actual code to the object stream.
The state machine captures instructions that your code generator produces, and remembers sequences of 0 or more generated instructions by transitioning between states. A state thus implicitly remembers a (short) sequence of generated but un-emitted instructions; it also has to remember the key parameters of the instructions it has captured, such as a register name, a constant value, and/or addressing modes and abstract target memory locations. A special start state remembers the empty string of instructions. At any moment, you need to be able to emit the unemitted instructions ("flush"); if you do this all the time, your peephole generator captures the next instruction and then emits it, doing no useful work.
To do useful work, we want the machine to capture as long a sequence as possible. Since there are typically many kinds of machine instructions, as practical matter you can't remember too many in a row or the state machine will become enormous. But it is practical to remember the last two or three for the most common machine instructions (load, add, cmp, branch, store). The size of the machine will really be determined by lenght of the longest peephole optimization we care to do, but if that length is P, the entire machine need not be P states deep.
Each state has transitions to a next state based on the "next" instruction I produced by your code generator. Imagine a state represents the capture of N instructions. The transition choices are:
When flushing the k instructions, what actually gets emitted is the peephole optimized version of those k. You can compute anything you want in emitting such instructions. You also need to remember "shift" the parameters for the remaining instructions appropriately.
This is all pretty easily implemented with a peephole optimizer state variable, and a case statement at each point where your code generator produces its next instruction. The case statement updates the peephole optimizer state and implements the transition operations.
Assume our machine is an augmented stack machine, has
PUSHVAR x
PUSHK i
ADD
POPVAR x
MOVE x,k
instructions, but the raw code generator generates only pure stack machine instructions, e.g., it does not emit the MOV instruction at all. We want the peephole optimizer to do this.
The peephole cases we care about are:
PUSHK i, PUSHK j, ADD ==> PUSHK i+j
PUSHK i, POPVAR x ==> MOVE x,i
Our state variables are:
PEEPHOLESTATE (an enum symbol, initialized to EMPTY)
FIRSTCONSTANT (an int)
SECONDCONSTANT (an int)
Our case statements:
GeneratePUSHK:
switch (PEEPHOLESTATE) {
EMPTY: PEEPHOLESTATE=PUSHK;
FIRSTCONSTANT=K;
break;
PUSHK: PEEPHOLESTATE=PUSHKPUSHK;
SECONDCONSTANT=K;
break;
PUSHKPUSHK:
#IF consumeEmitLoadK // flush state, transition and consume generated instruction
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
SECONDCONSTANT=K;
PEEPHOLESTATE=PUSHKPUSHK;
break;
#ELSE // flush state, transition, and reprocess generated instruction
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
PEEPHOLESTATE=PUSHK;
goto GeneratePUSHK; // Java can't do this, but other langauges can.
#ENDIF
}
GenerateADD:
switch (PEEPHOLESTATE) {
EMPTY: emit(ADD);
break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
emit(ADD);
PEEPHOLESTATE=EMPTY;
break;
PUSHKPUSHK:
PEEPHOLESTATE=PUSHK;
FIRSTCONSTANT+=SECONDCONSTANT;
break:
}
GeneratePOPX:
switch (PEEPHOLESTATE) {
EMPTY: emit(POP,X);
break;
PUSHK: emit(MOV,X,FIRSTCONSTANT);
PEEPHOLESTATE=EMPTY;
break;
PUSHKPUSHK:
emit(MOV,X,SECONDCONSTANT);
PEEPHOLESTATE=PUSHK;
break:
}
GeneratePUSHVARX:
switch (PEEPHOLESTATE) {
EMPTY: emit(PUSHVAR,X);
break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
PEEPHOLESTATE=EMPTY;
goto GeneratePUSHVARX;
PUSHKPUSHK:
PEEPHOLESTATE=PUSHK;
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
goto GeneratePUSHVARX;
}
The #IF shows two different styles of transitions, one that consumes the generated instruction, and one that does not; either works for this example. When you end up with a few hundred of these case statements, you'll find both types handy, with the "don't consume" version helping you keep your code smaller.
We need a routine to flush the peephole optimizer:
flush() {
switch (PEEPHOLESTATE) {
EMPTY: break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
break;
PUSHKPUSHK:
emit(PUSHK,FIRSTCONSTANT),
emit(PUSHK,SECONDCONSTANT),
break:
}
PEEPHOLESTATE=EMPTY;
return; }
It is interesting to consider what this peephole optimizer does with the following generated code:
PUSHK 1
PUSHK 2
ADD
PUSHK 5
POPVAR X
POPVAR Y
What this whole FSA scheme does is hide your pattern matching in the state transitions, and the response to matched patterns in the cases. You can code this by hand, and it is fast and relatively easy to code and debug. But when the number of cases gets large, you don't want to build such a state machine by hand. You can write a tool to generate this state machine for you; good background for this would be FLEX or LALR parser state machine generation. I'm not going to explain this here :-}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With