Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Haskell's Parsec for Programming Language Converter

Say I have two languages (A & B). My goal is to write some type of program to convert the syntax found in A to the equivalent of B. Currently my solution has been to use Haskell's Parsec to perform this task. As someone who is new to Haskell and functional programming for that matter however, finding just a simple example in Parsec has been quite difficult. The examples I have found on the web are either incomplete examples (frustrating for a new Haskell programmer) or too much removed from my goal.

So can someone provide me with an amazingly trivial and explicit example of using Parsec for something related to what I'd like to achieve? Or perhaps even some tutorial that parallels my goal as well.

Thanks.

like image 259
Vincent Russo Avatar asked Jul 09 '12 19:07

Vincent Russo


1 Answers

Consider the following simple grammar of a CSV document (In ABNF):

csv   = *crow
crow  = *(ccell ',') ccell CR
ccell = "'" *(ALPHA / DIGIT) "'"

We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:

tsv   = *trow
trow  = *(tcell HTAB) tcell CR
tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE

First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:

data XSV  = [Row]
type Row  = [Cell]
type Cell = String

Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:

csv :: Parser XSV
csv = XSV <$> many crow

crow :: Parser Row
crow = do cells <- ccell `sepBy` (char ',')
          newline
          return cells

ccell :: Parser Cell
ccell = do char '\''
           content <- many (digit <|> letter)
           char '\''
           return content

This parser uses do-notation. After a do, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <- to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b, which parses either a or b or many a, which parses as many as as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try to enable backtracking for one instance. try slows down parsing.

The result is a parser csv that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):

xsvToTSV :: XSV -> String
xsvToTSV xst = unlines (map toLines xst) where
  toLines = intersperse '\t'

Connecting these two things one gets a conversion function:

csvToTSV :: String -> Maybe String
csvToTSV document = case parse csv "" document of
  Left _    -> Nothing
  Right xsv -> xsvToTSV xsv

And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.

like image 66
fuz Avatar answered Sep 22 '22 04:09

fuz