I'm starting to learn Alex and believe I've gotten to the point where stateful context would be helpful, but I'm not entirely sure how to go about it. I'm attempting to lex a limited subset of erlang binaries. With the following lexer:
{
module Main (main, Token(..), AlexPosn(..), alexScanTokens, token_posn) where
}
%wrapper "posn"
$digit = 0-9 -- digits
$alpha = [a-zA-Z] -- alphabetic characters
$dbl_quote = \"
tokens :-
$white+ ;
"," { tok (\p s -> Comma p) }
"<<" { tok (\p s -> BinaryOpen p) }
">>" { tok (\p s -> BinaryClose p) }
$dbl_quote [^$dbl_quote]* $dbl_quote { tok (\p s -> ErlStr p (init (tail s))) }
$digit+ { tok (\p s -> ErlInt p (read s)) }
{
-- action helpers:
tok :: (AlexPosn -> String -> Token) -> AlexPosn -> String -> Token
tok f p s = f p s
data Token =
Comma AlexPosn |
BinaryOpen AlexPosn |
BinaryClose AlexPosn |
ErlInt AlexPosn Integer |
ErlStr AlexPosn String
deriving (Eq, Show)
token_posn :: Token -> AlexPosn
token_posn (Comma p) = p
token_posn (BinaryOpen p) = p
token_posn (BinaryClose p) = p
token_posn (ErlInt p _) = p
token_posn (ErlStr p _) = p
main :: IO ()
main = do
s <- getContents
print (alexScanTokens s)
}
I do pretty well. For instance,
> alex so_erlang_lexer.x && ghc --make -o erlexer so_erlang_lexer.hs && echo '<<"100", 1>>' | ./erlexer
[1 of 1] Compiling Main ( so_erlang_lexer.hs, so_erlang_lexer.o )
Linking erlexer ...
[BinaryOpen (AlexPn 0 1 1),ErlStr (AlexPn 2 1 3) "100",Comma (AlexPn 7 1 8),ErlInt (AlexPn 9 1 10) 1,BinaryClose (AlexPn 10 1 11)]
I would prefer to have the lexed return be equivalent to Binary [ErlStr "100", ErlInt 1]
, but I haven't been able to find a lexer which uses start codes that clicks in my head.
Would someone be so kind as to guide me a bit?
I am not exactly sure what you are trying to do with lexers, and knowledgable enough to guide you about this (but if all you need is filter useless tokens, alex's monadic interface seems a overkill), anyway here's an example code to use AlexUserState
to accumulate the selected tokens with the "monadUserState" wrapper.
{
module Main (main) where
}
%wrapper "monadUserState"
$digit = 0-9 -- digits
$alpha = [a-zA-Z] -- alphabetic characters
$dbl_quote = \"
tokens :-
$white+ ;
"," { ignoreToken }
">" { ignoreToken }
$dbl_quote [^$dbl_quote]* $dbl_quote { pushToken $ ErlStr . init . tail }
$digit+ { pushToken $ ErlInt . read }
{
alexEOF :: Alex ()
alexEOF = return ()
-- some useful interaces to the Alex monad (which is naturally an instance of state monad)
modifyUserState :: (AlexUserState -> AlexUserState) -> Alex ()
modifyUserState f = Alex (\s -> let st = alex_ust s in Right (s {alex_ust = f st},()))
getUserState :: Alex AlexUserState
getUserState = Alex (\s -> Right (s,alex_ust s))
-- Token definition minus position information for simplicity
data Token =
Comma |
BinaryOpen |
BinaryClose |
ErlInt Integer |
ErlStr String
deriving (Eq, Show)
newtype AlexUserState = Binary [Token]
deriving (Eq, Show)
alexInitUserState :: AlexUserState
alexInitUserState = Binary []
-- action helpers:
pushToken :: (String -> Token) -> AlexAction ()
pushToken tokenizer =
\(posn,prevChar,pending,s) len -> modifyUserState (push $ take len s) >> alexMonadScan
where
-- Here tokens are accumulated in reverse order for efficiency.
-- You need a more powerful data structure like Data.Sequence to preserve the order.
push :: String -> AlexUserState -> AlexUserState
push s (Binary ts) = Binary (tokenizer s : ts)
ignoreToken :: AlexAction ()
ignoreToken _ _ = alexMonadScan
runAlexScan :: String -> Either String AlexUserState
runAlexScan s = runAlex s $ alexMonadScan >> getUserState
main :: IO ()
main = getContents >>= print . runAlexScan
}
But I guess the main problem is that you seem to have not familiarized yourself enough yet with the concept and usage of monads in Haskell. Alex's monadic interface is actually very natural and typical of state monads, and once you've got some coding experiance, it's something you'd easily guess by just skimming the generated code. (And in case you guessed wrong, the type checker will most likely spot the relevant error.)
For this, since there seem to be numerous good questions and answers about monads here, I just refer to Real World Haskell (whose chapter about profiling was particularly helpful for me.)
But if you happen to have some knowledge of Category Theory already, possibly the quickest way to learn monads is to dive in some relevant papers directly (and please recall that monad in category theory is a generalization of actions,as in action of groups, which naturally fit in programming context.) For this please see this list of papers about monads and arrows, which includes introductory papers and tutorials for people with some technical background.
BTW, I have just started to learn Erlang. Could you kindly guide me a bit about it? Haven't the lack of static typing bitten you? Have you tried cloud-haskell and compared it to Erlang? And in which language do you feel most productive in the context of distributed programming?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With