Per megaparsec docs, "Since version 8, reporting multiple parse errors at once has become much easier." I haven't been able to find a single example of doing it. The only one I find is this. However it only shows how to parse a newline delimited toy language and also does not show how to combine multiple errors into ParseErrorBundle. This SO discussion is not conclusive.
You want to use withRecovery
to recover from Megaparsec-generated errors in conjunction with registerParseError
(or registerFailure
or registerFancyFailure
) to "register" those errors (or your own generated errors) for delayed processing.
At the end of the parse, if no parse errors have been registered, then parsing succeeds, while if one or more parse errors have been registered, they are all printed at once. If you register parse errors and then also trigger an unrecovered error, parsing immediately terminates and the registered errors and the final unrecovered error will all be printed.
Here's a very simple example that parses a comma-separated list of numbers:
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
type Parser = Parsec Void String
numbers :: Parser [Int]
numbers = sepBy number comma <* eof
where number = read <$> some digitChar
comma = recover $ char ','
-- recover to next comma
recover = withRecovery $ \e -> do
registerParseError e
some (anySingleBut ',')
char ','
On good input:
> parseTest numbers "1,2,3,4,5"
[1,2,3,4,5]
and on input with multiple errors:
> parseTest numbers "1.2,3e5,4,5x"
1:2:
|
1 | 1.2,3e5,4,5x
| ^
unexpected '.'
expecting ','
1:6:
|
1 | 1.2,3e5,4,5x
| ^
unexpected 'e'
expecting ','
1:12:
|
1 | 1.2,3e5,4,5x
| ^
unexpected 'x'
expecting ',', digit, or end of input
There are some subtleties here. For the following, only the first parse error is handled:
> parseTest numbers "1,2,e,4,5x"
1:5:
|
1 | 1,2,e,4,5x
| ^
unexpected 'e'
expecting digit
and you have to study the parser carefully to see why. The sepBy
successfully applies the number
and comma
parser in alternating sequence to parse "1,2,"
. When it gets to e
, it applies the number
parser which fails (because some digitChar
requires at least one digit char). This is an unrecovered error, so parsing ends immediately with no other errors registered, so only the one error is printed.
Also, if you dropped the <* eof
from the definition of numbers
(e.g., to make it part of a larger parser), you'd discover that:
> parseTest numbers "1,2,3.4,5"
gives a parse error on the period, but:
> parseTest numbers "1,2,3.4"
parses fine. On the other hand:
> parseTest numbers "1,2,3.4\n hundreds of lines without commas\nfinal line, with comma"
gives parse errors on the period and the comma at the end of the file.
The issue is that the comma
parser is used by sepBy
to determine when the comma-separated list of numbers has ended. If the parser succeeds (which it can do via recovery, gobbling up hundreds of lines to the next comma), sepBy
will try to keep running; if the parser fails (both initially, and because the recovery code can't find a comma after scanning the entire file), sepBy
will complete.
Ultimately, writing recoverable parsers is kind of tricky.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With