Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which Haskell parsing technology is most pleasant to use, and why? [closed]

"Pleasant" meaning, for example: you can write grammars in a "natural" way without having to rewrite them in a convoluted way, and without having to introduce boring boilerplate.

Let's stipulate for the purposes of this question that, unless the performance of a technology is pathologically bad, performance isn't the biggest issue here.

Although, having said that, you might want to mention if a technology falls down when it comes to having to rewrite a grammar for performance reasons.

Please give me an idea of the size and complexity of grammars you have worked with, when answering this question. Also, whether you have used any notable "advanced" features of the technology in question, and what your impressions of those were.

Of course, the answer to this question may depend on the domain, in which case, I'd be happy to learn this fact.

like image 823
Robin Green Avatar asked Nov 03 '10 11:11

Robin Green


2 Answers

It really depends what you start with and what you want to do. There isn't a one size fits all.

If have an LR grammar (e.g. you are working from a Yacc grammar), it is a good deal of work to turn it into an LL one suitable for Parsec or uu-parsinglib. However the many, sepBy etc. parsers are very helpful here, but you should expect the parser to be slower than Happy+Alex.

For LL combinator parsing, uu-parsinglib and it predecessor uu-parsing are nice but they are lacking something like Parsec's Token and Language modules so are perhaps less convenient. Some people like Malcolm Wallace's Parselib because they have a different model to Parsec for backtracking but I've no experience of them.

If you are decoding some formatted file rather than something like a programming language, Attoparsec or similar might be better than Parsec or uu-parsinglib. Better in this context being faster - not just ByteString vs. Char, but I think Attoparsec does less work regarding error handling / source location tracking so the parsers should run faster as they are doing less work per input element.

Also, bear in mind that text file formats might not always have grammars as such, so you might have to define some custom combinators to do special lexical tricks rather than just define "parser combinators" for each element.

For LR parsing, I found Ralf Hinze's Frown to be nicer than Happy - better error support and a nicer format for grammar files but Frown is not actively maintained and isn't on Hackage. I think it is LR(k) rather LR(1) which means it is more powerful w.r.t. lookahead.

Performance is not really a big concern w.r.t. a grammar. Programming languages have complex grammars, but you can expect fairly small files. As for data file formats it really behoves the designer of the format to design it in such a way that it allows efficient parsing. For combinator parsers you shouldn't need many advanced features for a data format file - if you do, either the format is badly designed (this sometimes happens unfortunately) or your parser is.

For the record I've written a C parser with Frown, GL-shading language with Happy, an unfinished C parser with UU_Parsing, and many things with Parsec. The choice for me was what I start with, LR grammar - Frown or Happy (now Happy as Frown isn't maintained), otherwise usually Parsec (as I said uu_parse is nice but lacks the convenience of LanguageDef). For binary formats I roll my own, but I usually have special requirements.

like image 104
stephen tetley Avatar answered Sep 20 '22 13:09

stephen tetley


Recently, I recast a DSL parser in uu-parsinglib which had been written in parsec. I found that it greatly simplified the program. My main motivation was to get the auto-correcting aspect. That just works. It's practically free! Also, I much preferred writing my parser in an applicative style as opposed to the monadic style of Parsec.

like image 23
David Place Avatar answered Sep 22 '22 13:09

David Place