Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make LLVM prefer one machine instruction over another?

Suppose I have two register computational blocks in the target machine: I and X. One may apply only integer operations to I-registers and both integer and float operations to X-registers. There're two types of instructions as well:

def ADDIi32 : MyInstruction< ..., (outs I:$Rs), (ins I:$Rm, I:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
...

def ADDXi32 : MyInstruction< ..., (outs X:$Rs), (ins X:$Rm, X:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
def ADDXf32 : MyInstruction< ..., (outs X:$Rs), (ins X:$Rm, X:$Rn), [(set f32:$Rs, (fadd f32:$Rm, f32:$Rn)]>;
...

They are differently encoded and have different asmstrings. So llvm can map

int a, b;
a = a + b;

To ADDIi32 or ADDXi32 but

float a, b;
a = a + b;

gets mapped to ADDXf32 only.

I would like LLVM to use ADDIi32 when possible but unfortunately I found no way to tell it that one instruction (or register) "costs" more than another. CostPerUse in Register class seems to be a candidate but it defines cost among other registers in group, not among all registers. I saw some threads claiming that AddedComplexity (field in Instruction class with ambiguous description) controls choosing patterns but does nothing as well.

It seems that LLVM chooses the first matching instruction. When I define ADDXf32 and ADDXi32 before ADDIi32 it uses only X-functions for any type of data. When I define ADDIi32 before ADDXf32 and ADDXi32 it uses I-instruction for integer data but matches nothing to float data (weird). I suppose I could insert ADDIi32 between ADDXf32 and ADDXi32 but they are currently in different .td files and it doesn't look like long-term solution.

Any ideas?

like image 320
ScalewingedAcidicorn Avatar asked Jul 15 '15 09:07

ScalewingedAcidicorn


2 Answers

Note that order of declaration does not affect the instruction selection: LLVM only sorts by matching more complex patterns first (which is what AddedComplexity affects -- note increasing AddedComplexity increases the chances of matching the pattern), tiebreaking by matching instructions with smaller CodeSize, finally tiebreaking nondeterministically. Hence you cannot rely on order of declaration.


EDIT: the below part of my original answer doesn't actually apply to your question -- see http://lists.llvm.org/pipermail/llvm-dev/2007-September/010833.html. In particular:

ISel patterns are matched against DAGs before register allocation. So ISel patterns can't take into account register classes. The register classes in the patterns are a selection constraint for the register allocator if that instruction is chosen, not a constraint on choosing that pattern.


In addition to I and X register classes, you should have a "IntRegsRegClass" or similar, in which you add both I and X registers. (This is the reg class that you call addRegisterClass on.)

The order in which you add the registers in this register class determines the preferred allocation order.

like image 106
Cheng Sun Avatar answered Sep 17 '22 12:09

Cheng Sun


This looks suspiciously similar to m68k with its split-brain D(ata) and A(ddress) registers. To handwave massively, D registers are close to the ALU and are used for integer operations whereas A registers are close to the address-calculation unit. This means that if a pointer is in a D register, it will need to be copied to an A register before dereferencing it.

Addition is an operation that makes sense for both kinds of register since one increments both counters and pointers, and so there are separate ADD and ADDA instructions. They're probably implemented completely differently in silicon, but the only programmer-visible difference is that ADD updates the condition codes based on the result, whereas ADDA doesn't affect them.

Several other instructions also come in a separate address-special form, including MOVE, which is how I tripped over the same problem that you're having when implementing separate MOVE and MOVEA for a m68k backend and being annoyed that the wrong one was being picked. This is how I ended up finding this SO post, and Cheng Sun's answer told me what I didn't really want to hear but did need to know.

My new approach is to define three register classes, Dn, An and Xn (Xn being an existing convention on m68k), consisting of D, A, and both D and A respectively. I've merged ADD/ADDA into a single new pseudo-instruction ADDZ (ADDX already exists) which acts on the Xn class in which the condition codes are undefined. I plan to later add a fixup that converts such pseudo-instructions into the appropriate valid instruction and delete any redundant TST instructions for cases where the condition codes were already set.

This has stopped the pattern-matching pass from e.g. issuing an ADDA followed by a MOVE when an ADD makes more sense. Because the register allocator avoids redundant moves, pointer arithmetic will be performed using A registers even though they are listed last in the Xn class. Were I also supporting floating-point, the FADD instruction would be defined to use a separate FPx register class and so not conflict with the existing ADD and ADDZ patterns.

In the case of your I and X register example, I would create GPR and XR classes for both I and X, and X classes respectively. The GPR class would also list all I registers first so that X registers are only used as a last resort. I would then write your example patterns like this:

def ADDi32 : MyInstruction< ..., (outs GPR:$Rs), (ins GPR:$Rm, GPR:$Rn), [(set i32:$Rs, (add i32:$Rm, i32:$Rn)]>;
def ADDf32 : MyInstruction< ..., (outs XR:$Rs), (ins XR:$Rm, XR:$Rn), [(set f32:$Rs, (fadd f32:$Rm, f32:$Rn)]>;
like image 29
pndc Avatar answered Sep 17 '22 12:09

pndc