I want to parse text similar to this in Haskell using Megaparsec.
# START SKIP
def foo(a,b):
c = 2*a # Foo
return a + b
# END SKIP
, where # START SKIP
and # END SKIP
marks the start and end of the block of text to parse.
Compared to skipBlockComment I want the parser to return the lines between the start and end marker.
This is my parser.
skip :: Parser String
skip = s >> manyTill anyChar e
where s = string "# START SKIP"
e = string "# END SKIP"
The skip
parser works as intended.
To allow for a variable amount of white space within the start and end marker, for example # START SKIP
I've tried the following:
skip' :: Parser String
skip' = s >> manyTill anyChar e
where s = symbol "#" >> symbol "START" >> symbol "SKIP"
e = symbol "#" >> symbol "END" >> symbol "SKIP"
Using skip'
to parse the above text gives the following error.
3:15:
unexpected 'F'
expecting "END", space, or tab
I would like to understand the cause of this error and how I can fix it.
As Alec already commented, the problem is that as soon as e
encounters '#'
, it counts as a consumed character. And the way parsec and its derivatives work is that as soon as you've consumed any characters, you're committed to that parsing branch – i.e. the manyTill anyChar
alternative is then not considered anymore, even though e
ultimately fails here.
You can easily request backtracking though, by wrapping the end delimiter in try
:
skip' :: Parser String
skip' = s >> manyTill anyChar e
where s = symbol "#" >> symbol "START" >> symbol "SKIP"
e = try $ symbol "#" >> symbol "END" >> symbol "SKIP"
This then will before consuming '#'
set a “checkpoint”, and when e
fails later on (in your example, at "Foo"
), it will act as if no characters had matched at all.
In fact, traditional parsec would give the same behaviour also for skip
. Just, because looking for a string and only succeeding if it matches entirely is such a common task, megaparsec's string
is implemented like try . string
, i.e. if the failure occurs within that fixed string then it will always backtrack.
However, compound parsers still don't backtrack by default, like they do in attoparsec. The main reason is that if anything can backtrack to any point, you can't really get a clear point of failure to show in the error message.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With