How do you deal with whitespace and comments? The fragments that are usually removed during syntactic analysis stage? I want to enable comments everywhere in my document that I am parsing. Is adding these in every elementary parser that I define the only option?
The way that it is done in Text.Parsec.Token
is to have every token consume the whitespace and comments which follow it.
This is done through the help of the lexeme
combinator:
lexeme p = do { x <- p; whitespace; return x }
which runs a parser p
, consumes the white space following it and returns whatever p
returned.
When you look at the source of makeTokenParser
(link) you'll see that many of the token parsers are wrapped using the lexeme
combinator, e.g.:
symbol name
= lexeme (string name)
Using this approach, the comments for lexeme
(link) point out that the only time your parser needs to explicitly consume white space is at the beginning of the input to skip over any white space before the first token.
You should use parsec’s abilities to define a “token parser”. The idea is that you define the characteristics of your language in a LanguageDef
, and then use the derived parsers in the resulting TokenParser
, e.g. identifier
, integer
etc. You can take the lexeme
function from your TokenParser
to turn any parser that you might have into one that swallows all trailing whitespace.
See makeTokenParser for more details.
An example is this code that I wrote. It’s real-world-code, so maybe not as educating as a good tutorial, but you can see how I define lang = makeTokenParser...
and then in the following parsers use parsers like whiteSpace lang
, parens lang
. parseTime
is an example where I use lexeme
around a “normal” parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With