It is well known how to convert code from SSA representation to a register machine. (Basically, graph coloring register allocation is the core of such conversion.)
But what's the general method for converting from SSA to a stack machine? (CIL byte code, in the case I'm looking at.) I would expect it to be simpler, given the lack of need for register allocation?
It's been over 15 years since I was involved in compiler construction, so I might not remember all the details.
Basically when going out of SSA, you need to generate load/store instructions into virtual registers at the end of all blocks leading to phi-nodes in successor blocks. This will lead to generating a number of virtual registers, which is usually higher than the available registers on the actual machine. So you apply register allocation on the local variables to come up with the real registers, spilling on the stack those values that don't fit.
For a stack-based machine, just don't do the last step. You end up with roughly the same number of virtual registers as phi-nodes in the compiled function (the algorithm is actually not trivial, a good starting point is the paper Efficiently Computing Single Static Assignment Form and the Control Dependence Graph by Ron Cytron, Jeane Ferrante, et. al.) These virtual registers will be your local variables.
When reading a value from a virtual register (local variable) to be used by an operation, first use an instruction to push it on the stack. The Java VM iload index
instruction is such an example: it loads the local variable at index and pushed its value on the stack. (see https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.iload)
When writing a value into a local variable, you pop it from the stack. See Java VM istore index
instruction (see https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.istore).
For example, if after going out of SSA you need to encode
local 5 = MUL local[2], local[4]
then you need to generate something like this:
ILOAD 4
ILOAD 2
MUL
ISTORE 5
For CIL bytecode, you have the equivalent ldarg
and starg
operations.
Of course, there is a lot of room for optimizations, to avoid redundant load/stores.
SSA is basically set of "logic" gates each with multiple inputs and typically one output.
So essentially you need to treat each gate as a set of stack pushes for the inputs, followed by a zero-operand operator that combines the stack values into the result for that gate. For instance, a + b * c as SSA with a multiply-and-accumulate operator has 3 pushes for a,b,c followed by a MAC_TOS operator.
If one has a chain of such gates, you can take the output of an earlier gate, which is already on the stack, and simply acts as if it has been pushed.
So, and SSA computation looks like an n-ary tree of gates with the output coming out at the root.
You can walk the tree in in-fix order, pushing operands that have not already been pushed, and generating a gate's operator when all operands have been computed.
So the SSA graph (tree):
a
\
*
b / \
+
c /
\ /
-
/
d
can be used to produce
push a
push b
times
push c
push d
subtract
times
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With