I'd like to implement this grammar rule using Haskell's parsec library:
((a | b | c)* (a | b))?
Which is a parser rule that accepts an optional (i.e. potentially empty) string. If the string it acccepts is not empty, then it can be consumed by passing through zero or more occurrences of the a
b
or c
parsers, but the accepted string by the outer most ?
optional parser must be consumed either by parser a
or b
, but not c
. Here's an example:
module Main where
import Text.Parsec
import Text.Parsec.Text
a,b,c :: GenParser () Char
a = char 'a'
b = char 'b'
c = char 'c'
-- ((a | b | c)* (a | b))?
myParser = undefined
shouldParse1,shouldParse2,shouldParse3,
shouldParse4,shouldFail :: Either ParseError String
-- these should succeed
shouldParse1 = runParser myParser () "" "" -- because ? optional
shouldParse2 = runParser myParser () "" "b"
shouldParse3 = runParser myParser () "" "ccccccb"
shouldParse4 = runParser myParser () "" "aabccab"
-- this should fail because it ends with a 'c'
shouldFail = runParser myParser () "" "aabccac"
main = do
print shouldParse1
print shouldParse2
print shouldParse3
print shouldParse4
print shouldFail
A first attempt might look like this:
myParser = option "" $ do
str <- many (a <|> b <|> c)
ch <- a <|> b
return (str ++ [ch])
But the many
just consumes all 'a' 'b' and 'c' characters in each test case, leaving a <|> b
no characters to consume.
The question:
Using parsec combinators, what is the correct implementation of ((a | b | c)* (a | b))?
to define myParser
?
We can also state this slightly different: c
in your parser may only succeed if it's followed by any token, which can be done with a single lookAhead
:
myParser = many (a <|> b <|> (c <* (lookAhead anyToken <?> "non C token"))) <* eof
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With