I'm learning to use Alex and Happy to write a small compiler. I want to maintain line and column information for my AST nodes so that I can provide meaningful error messages to the user. To illustrate how I plan to do it, I wrote a small example (see code below), and I'd like to know if the way I approached the problem (having AlexPosn attached to the tokens, attaching a polymorphic attribute field to AST nodes, using tkPos and astAttr) is good style or if there are better ways to handle position information.
Lexer.x:
{
module Lexer where
}
%wrapper "posn"
$white = [\ \t\n]
tokens :-
$white+ ;
[xX] { \pos s -> MkToken pos X }
"+" { \pos s -> MkToken pos Plus }
"*" { \pos s -> MkToken pos Times }
"(" { \pos s -> MkToken pos LParen }
")" { \pos s -> MkToken pos RParen }
{
data Token = MkToken AlexPosn TokenClass
deriving (Show, Eq)
data TokenClass = X
| Plus
| Times
| LParen
| RParen
deriving (Show, Eq)
tkPos :: Token -> (Int, Int)
tkPos (MkToken (AlexPn _ line col) _) = (line, col)
}
Parser.y:
{
module Parser where
import Lexer
}
%name simple
%tokentype { Token }
%token
'(' { MkToken _ LParen }
')' { MkToken _ RParen }
'+' { MkToken _ Plus }
'*' { MkToken _ Times }
x { MkToken _ X }
%%
Expr : Term '+' Expr { NAdd $1 $3 (astAttr $1) }
| Term { $1 }
Term : Factor '*' Term { NMul $1 $3 (astAttr $1) }
| Factor { $1 }
Factor : x { NX (tkPos $1) }
| '(' Expr ')' { $2 }
{
data AST a = NX a
| NMul (AST a) (AST a) a
| NAdd (AST a) (AST a) a
deriving (Show, Eq)
astAttr :: AST a -> a
astAttr (NX a) = a
astAttr (NMul _ _ a) = a
astAttr (NAdd _ _ a) = a
happyError :: [Token] -> a
happyError _ = error "parse error"
}
Main.hs:
module Main where
import Lexer
import Parser
main :: IO ()
main = do
s <- getContents
let toks = alexScanTokens s
print $ simple toks
I personally would be pretty ok with the style you've described. However, it is very manual and I was hoping to at least provide one alternative that might be easier to manage.
If you look a little further down the documentation for alex wrappers, you'll notice that the monad and monadstate wrappers both contain position information. The downside is that you now have the entire thing wrapped in a monad and it complicates the parser slightly. However, by wrapping it in a monad, the result of the parse is an Alex a
which means you have full access to the line and column information when you create your ast nodes. Now this simply removes some of the boiler plate from the lexer and doesn't do much more.
By doing this, you could also carry around the AlexState with your token, but that might be unnecessary.
If you need help actually fixing the parser to handle a monad/monadstate wrapper, I wrote a response on how I managed to get it working here: How to use an Alex monadic lexer with Happy?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With