I'd like to implement this grammar rule using Haskell's parsec library:
((a | b | c)* (a | b))?
Which is a parser rule that accepts an optional (i.e. potentially empty) string. If the string it acccepts is not empty, then it can be consumed by passing through zero or more occurrences of the a b or c parsers, but the accepted string by the outer most ? optional parser must be consumed either by parser a or b, but not c. Here's an example:
module Main where
import Text.Parsec
import Text.Parsec.Text
a,b,c :: GenParser () Char
a = char 'a'
b = char 'b'
c = char 'c'
-- ((a | b | c)* (a | b))?
myParser = undefined
shouldParse1,shouldParse2,shouldParse3,
      shouldParse4,shouldFail :: Either ParseError String
-- these should succeed
shouldParse1 = runParser myParser () "" "" -- because ? optional
shouldParse2 = runParser myParser () ""  "b"
shouldParse3 = runParser myParser () "" "ccccccb"
shouldParse4 = runParser myParser () "" "aabccab"
-- this should fail because it ends with a 'c'
shouldFail = runParser myParser () "" "aabccac"
main = do
  print shouldParse1
  print shouldParse2
  print shouldParse3
  print shouldParse4
  print shouldFail
A first attempt might look like this:
myParser = option "" $ do
  str <- many (a <|> b <|> c)
  ch  <- a <|> b
  return (str ++ [ch])
But the many just consumes all 'a' 'b' and 'c' characters in each test case, leaving a <|> b no characters to consume.
The question:
Using parsec combinators, what is the correct implementation of ((a | b | c)* (a | b))? to define myParser?
We can also state this slightly different: c in your parser may only succeed if it's followed by any token, which can be done with a single lookAhead:
myParser = many (a <|> b <|> (c <* (lookAhead anyToken <?> "non C token"))) <* eof
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With