I'm writing my first program with Parsec. I want to parse MySQL schema dumps and would like to come up with a nice way to parse strings representing certain keywords in case-insensitive fashion. Here is some code showing the approach I'm using to parse "CREATE" or "create". Is there a better way to do this? An answer that doesn't resort to buildExpressionParser would be best. I'm taking baby steps here.
p_create_t :: GenParser Char st Statement
p_create_t = do
x <- (string "CREATE" <|> string "create")
xs <- manyTill anyChar (char ';')
return $ CreateTable (x ++ xs) [] -- refine later
You can build the case-insensitive parser out of character parsers.
-- Match the lowercase or uppercase form of 'c'
caseInsensitiveChar c = char (toLower c) <|> char (toUpper c)
-- Match the string 's', accepting either lowercase or uppercase form of each character
caseInsensitiveString s = try (mapM caseInsensitiveChar s) <?> "\"" ++ s ++ "\""
Repeating what I said in a comment, as it was apparently helpful:
The simple sledgehammer solution here is to simply map toLower
over the entire input before running the parser, then do all your keyword matching in lowercase.
This presents obvious difficulties if you're parsing something that needs to be case-insensitive in some places and case-sensitive in others, or if you care about preserving case for cosmetic reasons. For example, although HTML tags are case-insensitive, converting an entire webpage to lowercase while parsing it would probably be undesirable. Even when compiling a case-insensitive programming language, converting identifiers could be annoying, as any resulting error messages would not match what the programmer wrote.
No, Parsec cannot do that in clean way. string
is implemented on top of
primitive tokens
combinator that is hard-coded to use equality test
(==)
. It's a bit simpler to parse case-insensitive character, but you
probably want more.
There is however a modern fork of Parsec, called Megaparsec which has built-in solutions for everything you may want:
λ> parseTest (char' 'a') "b"
parse error at line 1, column 1:
unexpected 'b'
expecting 'A' or 'a'
λ> parseTest (string' "foo") "Foo"
"Foo"
λ> parseTest (string' "foo") "FOO"
"FOO"
λ> parseTest (string' "foo") "fo!"
parse error at line 1, column 1:
unexpected "fo!"
expecting "foo"
Note the last error message, it's better than what you can get parsing
characters one by one (especially useful in your particular case). string'
is implemented just like Parsec's string
but uses case-insensitive
comparison to compare characters. There are also oneOf'
and noneOf'
that
may be helpful in some cases.
Disclosure: I'm one of the authors of Megaparsec.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With