The documentation for Text.Megaparsec.Char.Lexer.charLiteral
suggests using char '"' *> manyTill charLiteral (char '"')
for parsing string literals (where manyTill
is defined in the module Control.Applicative.Combinators
in the parser-combinators
library).
However, Control.Applicative.Combinators
also defines between
, which -- as far as I can see -- should do the same as the above suggestion when used like so: between (char '"') (char '"') (many charLiteral)
.
However, using the between
parser above does not work for parsing string literals -- failing with "unexpected end of input.
expecting '"' or literal character" (indicating that the ending quote is never detected). Why not?
Also, more generally, why isn't between pBegin pEnd (many p)
equivalent to pBegin *> manyTill p pEnd
?
between l r m
doesn't do anything spectacular, it really just tries l
then m
then r
and gives back the result of m
. So, in between (char '"') (char '"') (many charLiteral)
, the many charLiteral
doesn't know it's not supposed to consume the "
. The many
just keeps consuming whatever its argument parser accepts... which, because charLiteral
just accepts anything, means it churns right through everything until the end of the input. The second char '"'
has no way of stopping this, it just needs to make do with what's left... i.e., fail because there is nothing left!
By contrast, manyTill
actually checks whether the “till”, matches, and only applies each iteration of the content parser when it doesn't. Therefore, the terminating "
is not passed to charLiteral
, and you get the desired behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With