The requirements are taken from the DOT language specification, more precisely I'm trying to parse the [ID]
attribute, which can be e.g.,
any double-quoted string ("...") possibly containing escaped quotes (\")1;
The following should be a minimal example.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Void
import Data.Char
import Data.Text hiding ( map
, all
, concat
)
type Parser = Parsec Void Text
escape :: Parser String
escape = do
d <- char '\\'
c <- oneOf ['\\', '\"', '0', 'n', 'r', 'v', 't', 'b', 'f']
return [d, c]
nonEscape :: Parser Char
nonEscape = noneOf ['\\', '\"', '\0', '\n', '\r', '\v', '\t', '\b', '\f']
identPQuoted :: Parser String
identPQuoted =
let inner = fmap return (try nonEscape) <|> escape
in do
char '"'
strings <- many inner
char '"'
return $ concat strings
identP :: Parser Text
identP = identPQuoted >>= return . pack
main = parseTest identP "\"foo \"bar\""
The above code fails on the second with returns "foo "
even though I want foo "bar
I don't understand why. I thought that megaparsec
would repeatedly apply inner
until it parses the final "
. But it only repeatedly applies the nonEscape
parser and the first time that fails, and it uses escape
, it then appears to skip the rest of the inner string and just move on to the final quotes.
Your input text is "foo "bar"
, which does not contain any escaped quotes. It is parsed as a complete ID of "foo "
(followed by bar"
, which is ignored).
If you want to make sure that your parser consumes all of the available input, you can use
parseTest (identP <* eof) "..."
If you want to provide an ID with an escaped quote to the parser, like this ...
"foo \"bar"
... then you need to escape all of the special characters to embed them in Haskell source code:
main = parseTest identP "\"foo \\\"bar\""
\"
represents a literal "
and \\
represents a literal \
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With