I am writing a compiler for a small imperative language. The target language is Java bytecode, and the compiler is implemented in Haskell.
I've written a frontend for the language - i.e I have a lexer, parser and typechecker. I'm having trouble figuring out how to do code generation.
I keep a data structure representing the stack of local variables. I can query this structure with the name of a local variable and get its position in the stack. This data structure is passed around as I walk the syntax tree, and variables are popped and pushed as I enter and exit new scopes.
What I having trouble figuring out is how to emit the bytecode. Emitting strings at terminals and concatenating them at higher levels seems like a poor solution, both clarity- and performance-wise.
tl;dr How do I emit bytecode while waling the syntax tree?
My first project in Haskell a few months back was to write a c compiler, and what resulted was a fairly naive approach to code generation, which I'll walk through here. Please do not take this as an example of good design for a code generator, but rather view it as a quick and dirty (and ultimately naive) way to get something that works fairly quickly with decent performance.
I began by defining an intermediate representation LIR (Lower Intermediate Representation) which closely corresponded to my instruction set (x86_64 in my case):
data LIRInst = LIRRegAssignInst LIRReg LIRExpr
| LIRRegOffAssignInst LIRReg LIRReg LIRSize LIROperand
| LIRStoreInst LIRMemAddr LIROperand
| LIRLoadInst LIRReg LIRMemAddr
| LIREnterInst LIRInt
| LIRJumpLabelInst LIRLabel
| LIRIfInst LIRRelExpr LIRLabel LIRLabel -- false, then true
| LIRCallInst LIRLabel LIRLabel -- method label, return label
| LIRCalloutInst String
| LIRRetInst [LIRLabel] String -- list of successors, and the name of the method returning from
| LIRLabelInst LIRLabel
deriving (Show, Eq, Typeable)
Next up came a monad that would handle interleaving state throughout the translation (I was blissfully unaware of our friend-the State Monad
-at the time):
newtype LIRTranslator a = LIRTranslator
{ runLIR :: Namespace -> (a, Namespace) }
instance Monad LIRTranslator where
return a = LIRTranslator (\s -> (a, s))
m >>= f = LIRTranslator (\s ->
let (a, s') = runLIR m s
in runLIR (f a) s')
along with the state that would be 'threaded' through the various translation phases:
data Namespace = Namespace
{ temp :: Int -- id's for new temporaries
, labels :: Int -- id's for new labels
, scope :: [(LIRLabel, LIRLabel)] -- current program scope
, encMethod :: String -- current enclosing method
, blockindex :: [Int] -- index into the SymbolTree
, successorMap :: Map.Map String [LIRLabel]
, ivarStack :: [(LIRReg, [CFGInst])] -- stack of ivars (see motioned code)
}
For convenience, I also specified a series of translator monadic functions, for example:
-- |Increment our translator's label counter
incLabel :: LIRTranslator Int
incLabel = LIRTranslator (\ns@(Namespace{ labels = l }) -> (l, ns{ labels = (l+1) }))
I then proceeded to recursively pattern-match my AST, fragment-by-fragment, resulting in many functions of the form:
translateBlock :: SymbolTree -> ASTBlock -> LIRTranslator [LIRInst]
translateBlock st (DecafBlock _ [] _) = withBlock (return [])
translateBlock st block =
withBlock (do b <- getBlock
let st' = select b st
declarations <- mapM (translateVarDeclaration st') (blockVars block)
statements <- mapM (translateStm st') (blockStms block)
return (concat declarations ++ concat statements))
(for translating a block of the target language's code) or
-- | Given a SymbolTree, Translate a single DecafMethodStm into [LIRInst]
translateStm st (DecafMethodStm mc _) =
do (instructions, operand) <- translateMethodCall st mc
final <- motionCode instructions
return final
(for translating a method call) or
translateMethodPrologue :: SymbolTree -> DecafMethod -> LIRTranslator [LIRInst]
translateMethodPrologue st (DecafMethod _ ident args _ _) =
do let numRegVars = min (length args) 6
regvars = map genRegVar (zip [LRDI, LRSI, LRDX, LRCX, LR8, LR9] args)
stackvars <- mapM genStackVar (zip [1..] (drop numRegVars args))
return (regvars ++ stackvars)
where
genRegVar (reg, arg) =
LIRRegAssignInst (symVar arg st) (LIROperExpr $ LIRRegOperand reg)
genStackVar (index, arg) =
do let mem = LIRMemAddr LRBP Nothing ((index + 1) * 8) qword -- ^ [rbp] = old rbp; [rbp + 8] = ret address; [rbp + 16] = first stack param
return $ LIRLoadInst (symVar arg st) mem
for an example of actually generating some LIR code. Hopefully these three examples will give you a good starting point; ultimately, you'll want to go slowly, focusing on one fragment (or intermediate type) within your AST at a time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With