Say I have two languages (A & B). My goal is to write some type of program to convert the syntax found in A to the equivalent of B. Currently my solution has been to use Haskell's Parsec to perform this task. As someone who is new to Haskell and functional programming for that matter however, finding just a simple example in Parsec has been quite difficult. The examples I have found on the web are either incomplete examples (frustrating for a new Haskell programmer) or too much removed from my goal.
So can someone provide me with an amazingly trivial and explicit example of using Parsec for something related to what I'd like to achieve? Or perhaps even some tutorial that parallels my goal as well.
Thanks.
Consider the following simple grammar of a CSV document (In ABNF):
csv = *crow
crow = *(ccell ',') ccell CR
ccell = "'" *(ALPHA / DIGIT) "'"
We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:
tsv = *trow
trow = *(tcell HTAB) tcell CR
tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE
First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:
data XSV = [Row]
type Row = [Cell]
type Cell = String
Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:
csv :: Parser XSV
csv = XSV <$> many crow
crow :: Parser Row
crow = do cells <- ccell `sepBy` (char ',')
newline
return cells
ccell :: Parser Cell
ccell = do char '\''
content <- many (digit <|> letter)
char '\''
return content
This parser uses do
-notation. After a do
, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <-
to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b
, which parses either a
or b
or many a
, which parses as many a
s as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try
to enable backtracking for one instance. try
slows down parsing.
The result is a parser csv
that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):
xsvToTSV :: XSV -> String
xsvToTSV xst = unlines (map toLines xst) where
toLines = intersperse '\t'
Connecting these two things one gets a conversion function:
csvToTSV :: String -> Maybe String
csvToTSV document = case parse csv "" document of
Left _ -> Nothing
Right xsv -> xsvToTSV xsv
And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With