Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I effectively use the start code features of Alex?

I'm starting to learn Alex and believe I've gotten to the point where stateful context would be helpful, but I'm not entirely sure how to go about it. I'm attempting to lex a limited subset of erlang binaries. With the following lexer:

{
module Main (main, Token(..), AlexPosn(..), alexScanTokens, token_posn) where
}

%wrapper "posn"

$digit = 0-9      -- digits                                                                            
$alpha = [a-zA-Z] -- alphabetic characters                                                             
$dbl_quote = \"

tokens :-

  $white+                        ;
  ","                            { tok (\p s -> Comma p) }
  "<<"                           { tok (\p s -> BinaryOpen p) }
  ">>"                           { tok (\p s -> BinaryClose p) }
  $dbl_quote [^$dbl_quote]* $dbl_quote { tok (\p s -> ErlStr p (init (tail s))) }
  $digit+                        { tok (\p s -> ErlInt p (read s)) }

{
-- action helpers:                                                                                     
tok :: (AlexPosn -> String -> Token) -> AlexPosn -> String -> Token
tok f p s = f p s

data Token =
  Comma       AlexPosn |
  BinaryOpen  AlexPosn |
  BinaryClose AlexPosn |
  ErlInt   AlexPosn Integer |
  ErlStr   AlexPosn String
  deriving (Eq, Show)

token_posn :: Token -> AlexPosn
token_posn (Comma    p) = p
token_posn (BinaryOpen  p) = p
token_posn (BinaryClose p) = p
token_posn (ErlInt   p _) = p
token_posn (ErlStr   p _) = p

main :: IO ()
main = do
  s <- getContents
  print (alexScanTokens s)
}

I do pretty well. For instance,

> alex so_erlang_lexer.x  && ghc --make -o erlexer so_erlang_lexer.hs && echo '<<"100", 1>>' | ./erlexer 
[1 of 1] Compiling Main             ( so_erlang_lexer.hs, so_erlang_lexer.o )
Linking erlexer ...
[BinaryOpen (AlexPn 0 1 1),ErlStr (AlexPn 2 1 3) "100",Comma (AlexPn 7 1 8),ErlInt (AlexPn 9 1 10) 1,BinaryClose (AlexPn 10 1 11)]

I would prefer to have the lexed return be equivalent to Binary [ErlStr "100", ErlInt 1], but I haven't been able to find a lexer which uses start codes that clicks in my head.

  • GHC's lexer referenced here doesn't use any Alex wrappers.
  • Alex's own documentation on its monad and monadUserState wrappers here leaves me uncertain which I should choose and how I might make use of either.
  • Alex's example for tiger is the most promising, but it's so large I'm having trouble clearing up my ignorance.
  • This question makes use of the monad parser, but doesn't seem to be using the state features of it.

Would someone be so kind as to guide me a bit?

like image 750
troutwine Avatar asked Oct 03 '12 22:10

troutwine


1 Answers

I am not exactly sure what you are trying to do with lexers, and knowledgable enough to guide you about this (but if all you need is filter useless tokens, alex's monadic interface seems a overkill), anyway here's an example code to use AlexUserState to accumulate the selected tokens with the "monadUserState" wrapper.



{
module Main (main) where
}

%wrapper "monadUserState"

$digit = 0-9      -- digits                                                                            
$alpha = [a-zA-Z] -- alphabetic characters                                                             
$dbl_quote = \"

tokens :-

  $white+                              ;
  ","                                  { ignoreToken  }
  ">"                                 { ignoreToken }
  $dbl_quote [^$dbl_quote]* $dbl_quote { pushToken  $ ErlStr . init . tail  }
  $digit+                              { pushToken  $ ErlInt . read }

{

alexEOF :: Alex ()
alexEOF = return ()

-- some useful interaces to the Alex monad (which is naturally an instance of state monad)
modifyUserState :: (AlexUserState -> AlexUserState) -> Alex ()
modifyUserState f = Alex (\s -> let st = alex_ust s in Right (s {alex_ust = f st},()))

getUserState ::  Alex AlexUserState
getUserState = Alex (\s -> Right (s,alex_ust s))

-- Token definition minus position information for simplicity
data Token =
  Comma        |
  BinaryOpen   |
  BinaryClose  |
  ErlInt    Integer |
  ErlStr    String
  deriving (Eq, Show)

newtype AlexUserState = Binary [Token]
  deriving (Eq, Show)

alexInitUserState :: AlexUserState
alexInitUserState = Binary []


-- action helpers:                                                                                     
pushToken :: (String -> Token) -> AlexAction ()
pushToken tokenizer = 
  \(posn,prevChar,pending,s) len -> modifyUserState (push $ take len s) >> alexMonadScan
    where
       -- Here tokens are accumulated in reverse order for efficiency.
       -- You need a more powerful data structure like Data.Sequence to preserve the order.
       push :: String -> AlexUserState -> AlexUserState
       push s (Binary ts) = Binary (tokenizer s : ts) 

ignoreToken :: AlexAction ()
ignoreToken _ _   = alexMonadScan


runAlexScan :: String -> Either String AlexUserState
runAlexScan s = runAlex s $ alexMonadScan >> getUserState


main :: IO ()
main = getContents >>= print . runAlexScan

}

But I guess the main problem is that you seem to have not familiarized yourself enough yet with the concept and usage of monads in Haskell. Alex's monadic interface is actually very natural and typical of state monads, and once you've got some coding experiance, it's something you'd easily guess by just skimming the generated code. (And in case you guessed wrong, the type checker will most likely spot the relevant error.)

For this, since there seem to be numerous good questions and answers about monads here, I just refer to Real World Haskell (whose chapter about profiling was particularly helpful for me.)

But if you happen to have some knowledge of Category Theory already, possibly the quickest way to learn monads is to dive in some relevant papers directly (and please recall that monad in category theory is a generalization of actions,as in action of groups, which naturally fit in programming context.) For this please see this list of papers about monads and arrows, which includes introductory papers and tutorials for people with some technical background.

BTW, I have just started to learn Erlang. Could you kindly guide me a bit about it? Haven't the lack of static typing bitten you? Have you tried cloud-haskell and compared it to Erlang? And in which language do you feel most productive in the context of distributed programming?

like image 123
mnish Avatar answered Oct 03 '22 07:10

mnish