Let's say I'm implementing a bytecode compiler, similar to Lua's/Python's... so on.
I'm traversing the AST, generating bytecode instructions, and I run into a break
inside of an if-else
inside of a while
loop:
while (cond1)
if (cond2)
...
else
break
(I tried writing out equivalent bytecode but it didn't look too helpful.)
The point is, there are at least 4 "jump" instructions in that example, and I
want to find an elegant solution to filling in the jump addresses as I compile the AST...
I don't know the jump address for the while
loop or the break
until after I've fully
"compiled" the inner statements.
I'm not reading the dragon book just yet.
If I'm recursively compiling the AST, when I reach a break
statement inside of some arbitrary number of loops and if-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
Bytecode compiler for the BASIC programming language (written in Haskell) Welcome to key, the programming language. An Interpreter written in Go that processes a psuedo JavaScript language. A Python bytecode compiler written in Python.
Bytecode instructions are oriented towards the high-level programming language, from which they are compiled. Often this makes the bytecode so similar to the program it originated from, that it might be possible to reconstruct the complete source code from the bytecode.
This approach lets each system interpret the same bytecode files. The bytecode itself is in a binary format that consists of constants, references and numeric codes. The Java virtual machine interprets bytecode and converts it to machine language that is platform-specific.
When source code is compiled, it is eventually converted to IL bytecode rather than to machine instruction assembly code, as an additional step in the code compilation and execution process. The runtime compiler's responsibility is to convert source code from languages such as Java, C#, VB.NET and others to IL bytecode.
The principle you need is called "backpatching": fill in a dummy value for the forward jump, generate the code for the body of the statement, and then go back and replace the dummy value with the real one at the end.
e.g.
# start of codegen for 'while'
L1:
[evaluate cond1]
jne L2 # forward reference: use a dummy value for now
# start of codegen for 'if ... else'
L3:
[evaluate cond2]
jne L4 # another forward reference: use a dummy value for now
[code in 'if' branch]
j L5 # another forward reference: use a dummy value for now
L4:
[code in 'else' branch before 'break']
j L2
[code in 'else' branch after 'break']
L5: # end of 'if ... else'
# now go back and fill in references to L4, L5 inside the 'if ... else' block
# end of codegen for 'if ... else'
# codegen for 'while' continues...
j L1 # loop
L2: # end of 'while' loop
# now go back and fill in references to L2 inside the 'while' block
# end of codegen for 'while'
Regarding your edit:
If I'm recursively compiling the AST, when I reach a
break
statement inside of some arbitrary number of loops andif-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
The target of the jump which implements the break
statement is the end of the innermost enclosing loop; yes, you could track that with an explicit external stack.
But, if you have a recursive AST walking function, you already have an implicit stack - the call frames of the recursive function invocations - so you probably don't need an explicit one as well.
e.g. in
...
while (outer)
...
if (outer_done)
break
...
while (inner)
...
if (inner_done)
break
...
[break from inner 'while' ends up here]
...
if (outer_done_2)
break
...
[break from outer 'while' ends up here]
...
the entirety of the code generation for the inner while
loop happens within a recursive walk of the AST for the body of the outer while
loop. Inside this recursive call, you only need to care about the inner loop, not the outer one.
So, all you need to do is:
while
break
that appears within the body of this while
e.g. something a bit like this:
codegen for while:
save previous break backpatch list
initialise break backpatch list as empty
perform codegen for evaluating condition
perform codegen for body statements
apply backpatches
restore previous break backpatch list
The current break
backpatch list must be part of some state which is passed to all the codegen functions (the codegen for break
needs to append to it). But the saved list could just be tracked as a local variable of the while
codegen function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With