Consider the usage of these different parser combinators.
import Control.Applicative.Combinators
import Text.Regex.Applicative
main :: IO ()
main = do
let parser1 = sym '"' *> manyTill anySym (sym '"')
print $ match parser1 "\"abc\""
let parser2 = sym '"' *> many anySym <* sym '"'
print $ match parser2 "\"abc\""
import Control.Applicative.Combinators
import Text.ParserCombinators.ReadP hiding(many, manyTill)
main :: IO ()
main = do
let parser1 = char '"' *> manyTill get (char '"')
print $ readP_to_S parser1 "\"abc\""
let parser2 = char '"' *> many get <* char '"'
print $ readP_to_S parser2 "\"abc\""
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative.Combinators
import Data.Attoparsec.Text hiding(manyTill)
main :: IO ()
main = do
let parser1 = char '"' *> manyTill anyChar (char '"')
print $ parseOnly parser1 "\"abc\""
let parser2 = char '"' *> many anyChar <* char '"'
print $ parseOnly parser2 "\"abc\""
import Control.Applicative.Combinators
import Text.Megaparsec hiding(many, manyTill)
import Data.Void
main :: IO ()
main = do
let parser1 = single '"' *> manyTill anySingle (single '"') :: Parsec Void String String
print $ parseMaybe parser1 "\"abc\""
let parser2 = single '"' *> many anySingle <* single '"' :: Parsec Void String String
print $ parseMaybe parser2 "\"abc\""
With all four of them, the manyTill
parser successfully matches abc
, since this doesn't depend on backtracking. With regex-applicative
and ReadP
, the many
parser also successfully matches abc
, since they both backtrack by default. With megaparsec
, the many
parser fails to match, since it doesn't backtrack by default. So far, everything makes sense. However, with attoparsec
, the many
parser fails to match, even though it does backtrack: its documentation says "attoparsec parsers always backtrack on failure" and "if you feed incremental input to an a parser, it will require memory proportional to the amount of input you supply. (This is necessary to support arbitrary backtracking.)". Why is this? Isn't that supposed to be exactly what backtracking is?
The meaning of "backtracking" in the Attoparsec documentation is different than the meaning of backtracking for the other backtracking parsers.
It helps to review what "backtracking" means when using try
for a Parsec or Megaparsec parser. These parsers have a concept of failing after consuming input ("consume err" = cerr) versus failing after consuming nothing ("empty err" = eerr). For these parsers, the p <|> q
alternative operator treats failure of p
differently if it's cerr (fail the whole p <|> q
immediately) versus eerr (try the alternative q
instead). The try
function backtracks by converting cerr to eerr. That is, try p <|> q
will "backtrack" an erroneous consumption of the input stream in the event p
fails with cerr. It's a single step of backtracking on failure within alternatives (though with nested try
calls, multiple steps of backtracking can be performed in a sequence/cascade of parse failures).
Attoparsec doesn't differentiate betweeen cerr and eerr, so it's as if all parsers are surrounded by try
calls. This means that it automatically performs multiple steps of backtracking on failure within alternatives.
ReadP
backtracks implicitly by simultaneously evaluating every possible parse in parallel, discarding those that ever fail, and picking the "first" one that remains. It "backtracks" failures over a tree of all possible parses, whether the failures were generated in the context of an alternative or not.
It turns out that "multiple steps of backtracking on failure within alternatives" is a more modest form of backtracking than "backtracking over a tree of all possible parses".
A couple of simplified examples may help show the difference. Consider the parser:
(anyChar *> char 'a') <|> char 'b'
and the input string "bd"
. This parser fails with Parsec/Megaparsec. The left-hand alternative consumes the "b"
with anyChar
before failing, having consumed input (cerr), and the whole parser fails. This works fine with Attoparsec, though: the left-hand alternative fails at char 'a'
, and Attoparsec backtracks on this failure within an alternative to try char 'b'
which succeeds. It also works with ReadP
which constructs all possible parses in parallel, before discarding the parse from the left-hand alternative when char 'a'
fails, resulting in a single successful parse by char 'b'
.
Now, consider the parser:
(anyChar <|> pure '*') *> char 'b'
and the input string "b"
. (Recall that pure '*'
consumes nothing and always succeeds.) This parser fails with Parsec/Megaparsec, because anyChar
parses the "b"
, the pure '*'
is ignored, and the empty string isn't matched by char 'b'
. It also fails with Attoparsec: anyChar
successfully parses the "b"
, and there's no failure in the context of an alternative, so no backtracking to try the pure '*'
alternative. The attempt to parse the empty string with char 'b'
subsequently fails. (This failure, if it occurs in the context of another alternative might result in backtracking of that alternative, but never reconsideration of this pure '*'
alternative.)
In contrast, this parses fine with ReadP
. ReadP
parses the alternatives in parallel, considering both anyChar
parsing the "b"
and pure '*'
parsing nothing. When the char 'b'
parse is tried, it fails on the former but succeeds on the latter.
Back to your example. When parsing with Attoparsec, because:
many p = ((:) <$> p <*> many p) <|> pure []
the left alternative (:) <$> anyChar <*> many anyChar
will continue to successfully match all the way up to and including the point where anyChar
matches the closing quotation mark. At EOF, the left-hand side will fail (without consuming input, though Attoparsec doesn't care about that), and the right-hand side will succeed. The only failure within an alternative is at EOF, which didn't consume anything anyway, so Attoparsec's automatic "backtracking" plays no role; Megaparsec would have done the same thing. Anyway, once this many anyChar
has succeeded, it won't be revisited, even if the terminating char '"'
subsequently fails.
So, that's why you need manyTill
to explicitly watch for the terminating character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With