Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse between quotes with Haskell

The requirements are taken from the DOT language specification, more precisely I'm trying to parse the [ID] attribute, which can be e.g.,

any double-quoted string ("...") possibly containing escaped quotes (\")1;

The following should be a minimal example.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import           Text.Megaparsec
import           Text.Megaparsec.Char
import           Data.Void
import           Data.Char
import           Data.Text               hiding ( map
                                        , all
                                        , concat
                                        )

type Parser = Parsec Void Text

escape :: Parser String
escape = do
    d <- char '\\'
    c <- oneOf ['\\', '\"', '0', 'n', 'r', 'v', 't', 'b', 'f']
    return [d, c]

nonEscape :: Parser Char
nonEscape = noneOf ['\\', '\"', '\0', '\n', '\r', '\v', '\t', '\b', '\f']

identPQuoted :: Parser String
identPQuoted =
    let inner = fmap return (try nonEscape) <|> escape
    in  do
      char '"'
      strings <- many inner
      char '"'
      return $ concat strings

identP :: Parser Text
identP = identPQuoted >>= return . pack

main = parseTest identP "\"foo \"bar\""

The above code fails on the second with returns "foo " even though I want foo "bar

I don't understand why. I thought that megaparsec would repeatedly apply inner until it parses the final ". But it only repeatedly applies the nonEscape parser and the first time that fails, and it uses escape, it then appears to skip the rest of the inner string and just move on to the final quotes.

like image 875
Vey Avatar asked Mar 06 '23 10:03

Vey


1 Answers

Your input text is "foo "bar", which does not contain any escaped quotes. It is parsed as a complete ID of "foo " (followed by bar", which is ignored).

If you want to make sure that your parser consumes all of the available input, you can use

parseTest (identP <* eof) "..."

If you want to provide an ID with an escaped quote to the parser, like this ...

"foo \"bar"

... then you need to escape all of the special characters to embed them in Haskell source code:

main = parseTest identP "\"foo \\\"bar\""

\" represents a literal " and \\ represents a literal \.

like image 163
melpomene Avatar answered Mar 11 '23 21:03

melpomene